The stupid thing is that any device with an orientation sensor is still writing images the wrong way and then setting a flag, expecting every viewing application to rotate the image.
The camera knows which way it's oriented, so it should just write the pixels out in the correct order. Write the upper-left pixel first. Then the next one. And so on. WTF.
One interesting thing about JPEG is that you can rotate an image with no quality loss. You don't need to convert each 8x8 square to pixels, rotate and convert back, instead you can transform them in the encoded form. So, rotating each 8x8 square is easy, and then rotating the image is just re-ordering the rotated squares.
I'm not sure if this is how JPEG implements it, but in H.264, you just have metadata which specifies a crop (since H.264 also encodes in blocks). From some quick Googling, it seems like JPEG also has EXIF data for cropping, so if that's the mechanism that's used to crop off the bottom and right portions today, there's no reason it couldn't also be used to crop off the top and left portions when losslessly rotating an image's blocks.
Indeed. Whenever I'm using an image browser/manager application that supports rotating images, I wonder if it's doing JPEG rotation properly (as you describe) or just flipping the dumb flag.
> What are the arguments for this? It would seem easier for everyone to rotate and then store exif for the original rotation if necessary.
Performance. Rotation during rendering is often free, whereas the camera would need an intermediate buffer + copy if it's unable to change the way it samples from the sensor itself.
Given that rotation sensors have been standard equipment on most cameras (AKA phones) for many years now, I would expect pixel-reordering to be built into supporting ASICs and to impose negligible performance penalties.
For anything GPU-rendered, applying a rotation matrix to a texture sample and/or frame-buffer write is trivially cheap (see also why Vulkan prerotation exists on Android). Even ignoring GPU-rendering, you always are doing a copy as part of rendering and often have some sort of matrix operation anyway at which point concatenating a rotation matrix often doesn't change much of anything.
The cost is paid in different memory access patterns which may or may not be mitigated by the GPU scheduler. It's an insignificant cost either way though, both for the encoder and the renderer. Also depending on the pixel order in sensor, file or frame buffer "rotated" might actually be the native way and the default is where things get flipped around from source to destination.
Access pattern is mitigated by texture swizzling which will happen regardless of how it's ultimately rendered. So even if drawn with an identity matrix you're still "paying" for it regardless just due to the underlying texture layout. GPUs can sample from linear textures, but often it comes with a significant performance penalty unless you stay on a specific, and undefined, path.
Pretty much every pixel rendered these days was generated by a shader so gpu side you probably already have way more translation options than just a 90° rotation (likely already being used for a rotation of 0°). You'd likely have to write more code cpu side to handle the case of tell the gpu to rotate this please and handle the UI layout diffrence. Honestly not a lot of code.
> The camera knows which way it's oriented, so it should just write the pixels out in the correct order. Write the upper-left pixel first. Then the next one. And so on. WTF.
The hardware likely is optimized for the common case, so I would think that can be a lot slower. It wouldn’t surprise me, for example, if there are image sensors out there that can only be read out in top to bottom, left to right order.
Also, with RAW images and sensors that aren’t rectangular grids, I think that would complicate RAW images parsing. Code for that could have to support up to four different formats, depending on how the sensor is designed,
At this point I expect any camera ASICs to be able to incorporate this logic for plenty-fast processing. Or to do it when writing out the image file, after acquiring it to a buffer.
Your raw-image idea is interesting. I'm curious as to how photosites' arrangement would play into this.
Rotation for speed/efficiency/compression reasons (indeed with PNG's horizontal line filters it can have a compression reason too) should have been a flag part of the compressed image data format and for use by the encoder/decoder only (which does have caveats for renderers to handle partial decoding though... but the point is to have the behavior rigorously specified and encoded in the image format itself and handled by exactly one known place namely the decoder), not part of metadata
It's basically a shame that the exif metadata contains things that affect the rendering
Most modern camera modules have built in hardware codecs like mjpeg, region of interest selection, and frame mirror/flip options.
This is particularly important on smartphones and battery operated devices. However, most smartphone devices simply save the photo the same way regardless of orientation, and simply add a display-rotated flag to the metadata.
It can be super annoying sometimes, as one can't really disable the feature on many devices. =3
the main reason is probably that the chip is already outputting the image in a lossy format, and if you reorder the pixels you must reencode the image which means degrading the image, so it's much better to just change the exif orientation.
JPEG can be rotated losslessly. `jpegtran` can do it, for example (and comes with a script called `exifautotran` to automatically normalise the orientation of a bunch of JPEG files at once).
Burst mode in cameras means the sensor is readout is buffered in RAM while the encoding and writing to persistent storage catches up. Rotating the buffer would be part of the latter and not affect burst speed - and is an insignificant cost anyway.
The camera knows which way it's oriented, so it should just write the pixels out in the correct order. Write the upper-left pixel first. Then the next one. And so on. WTF.