Hacker Newsnew | past | comments | ask | show | jobs | submit | more Uncorrelated's commentslogin

Other commenters here are correct that the LIDAR is too low-resolution to be used as the primary source for the depth maps. In fact, iPhones use four-ish methods, that I know of, to capture depth data, depending on the model and camera used. Traditionally these depth maps were only captured for Portrait photos, but apparently recent iPhones capture them for standard photos as well.

1. The original method uses two cameras on the back, taking a picture from both simultaneously and using parallax to construct a depth map, similar to human vision. This was introduced on the iPhone 7 Plus, the first iPhone with two rear cameras (a 1x main camera and 2x telephoto camera.) Since the depth map depends on comparing the two images, it will naturally be limited to the field of view of the narrower lens.

2. A second method was later used on iPhone XR, which has only a single rear camera, using focus pixels on the sensor to roughly gauge depth. The raw result is low-res and imprecise, so it's refined using machine learning. See: https://www.lux.camera/iphone-xr-a-deep-dive-into-depth/

3. An extension of this method was used on an iPhone SE that didn't even have focus pixels, producing depth maps purely based on machine learning. As you would expect, such depth maps have the least correlation to reality, and the system could be fooled by taking a picture of a picture. See: https://www.lux.camera/iphone-se-the-one-eyed-king/

4. The fourth method is used for selfies on iPhones with FaceID; it uses the TrueDepth camera's 3D scanning to produce a depth map. You can see this with the selfie in the article; it has a noticeably fuzzier and low-res look.

You can also see some other auxiliary images in the article, which use white to indicate the human subject, glasses, hair, and skin. Apple calls these portrait effects mattes and they are produced using machine learning.

I made an app that used the depth maps and portrait effects mattes from Portraits for some creative filters. It was pretty fun, but it's no longer available. There are a lot of novel artistic possibilities for depth maps.


> but apparently recent iPhones capture them for standard photos as well.

Yes, they will capture them from the main photo mode if there’s a subject (human or pet) in the scene.

> I made an app that used the depth maps and portrait effects mattes from Portraits for some creative filters. It was pretty fun, but it's no longer available

What was your app called? Is there any video of it available anywhere? Would be curious to see it!

I also made a little tool, Matte Viewer, as part of my photo tool series - but it’s just for viewing/exporting them, no effects bundled:

https://apps.apple.com/us/app/matte-viewer/id6476831058


I'm sorry for neglecting to respond until now. The app was called Portrait Effects Studio and later Portrait Effects Playground; I took it down because it didn't meet my quality standards. I don't have any public videos anymore, but it supported background replacement and filters like duotone, outline, difference-of-Gaussians, etc., all applied based on depth or the portrait effects matte. I can send you a TestFlight link if you're curious.

I looked at your apps, and it turns out I'm already familiar with some, like 65x24. I had to laugh -- internally, anyway -- at the unfortunate one-star review you received on Matte Viewer from a user that didn't appear to understand the purpose of the app.

One that really surprised me was Trichromy, because I independently came up with and prototyped the same concept! And, even more surprisingly, there's at least one other such app on the App Store. And I thought I was so creative coming up with the idea. I tried Trichromy; it's quite elegant, and fast.

Actually, I feel we have a similar spirit in terms of our approach to creative photography, though your development skills apparently surpass mine. I'm impressed by the polish on your websites, too. Cheers.


> Yes, they will capture them from the main photo mode if there’s a subject (human or pet) in the scene.

One of the example pictures on TFA is a plant. Given that, are you sure iOS is still only taking depth maps for photos that get the "portrait" icon in the gallery? (Or have they maybe expanded the types of possible portrait subjects?)


It will capture the depth map and generate the semantic mattes (except in some edge cases) no matter the subject if you explicitly set the camera in Portrait mode, which is how I would guess the plant photo from the article was captured.

My previous comment was about the default Photo mode.

If you have a recent iPhone (iPhone 15 or above iirc) try it yourself - taking a photo of a regular object in the standard Photo mode won’t yield a depth map, but one of a person or pet will. Any photo taken from the Portrait mode will yield a depth map.

You can find out more about this feature by googling “iPhone auto portrait mode”.

Apple’s documentation is less helpful with the terminology; they call it “Apply the portrait effect to photos taken in Photo mode”

https://support.apple.com/guide/iphone/edit-portrait-mode-ph...


Seems crazy to run an object recognition algorithm in order to decide if depths should be recorded.

I’d thought that would be heavier than just record the depths.


Probably a pretty light classifier on the NPU. Doesn’t even have to care about what particular object it is, just if it matches training data for “capture depth map”.


there was recently a 64 gates NN implementation in C shared on HN that was interesting for stuff like this


> [...] if you explicitly set the camera in Portrait mode, which is how I would guess the plant photo from the article was captured.

Ah, that makes sense, thank you!


https://lookingglassfactory.com is a holographic image frame that can show those iPhone photos with depth maps in them in actual 3D.


For method 3 that article is 5 years old, see: https://github.com/apple/ml-depth-pro?tab=readme-ov-file


One thing worth noting: LIDAR is primarily optimized for fast AF and low-light focusing rather than generating full-res depth maps.


Can method 4 be used by a security application to do liveness detection?



That's what FaceID does.


Obviously -- that is not my question though. I was curious if that data is exposed via API or within the image for front camera as well, so a third party app can do it.


most kyc apps have you record live video, i'm assuming they can then infer those depth maps from the video source regardless of your phone capabilities?


I bet most of them are just dumb and video based as they work on both Android and iOS.


Yes. For example, Lightroom and Camera Raw support HDR editing and export from RAW images, and Adobe published a good rundown on the feature when they introduced it.

https://blog.adobe.com/en/publish/2023/10/10/hdr-explained

Greg Benz Photography maintains a list of software here:

https://gregbenzphotography.com/hdr-display-photo-software/

I'm not sure what FOSS options there are; it's difficult to search for given that "HDR" can mean three or four different things in common usage.


I find the default HDR (as in gain map) presentation of iPhone photos to look rather garish, rendering highlights too bright and distracting from the content of the images. The solution I came up with for my own camera app was to roll off and lower the highlights in the gain map, which results in final images that I find way more pleasing. This seems to be somewhat similar to what Halide is introducing with their "Standard" option for HDR.

Hopefully HN allows me to share an App Store link... this app works best on Pro iPhones, which support ProRAW, although I do some clever stuff on non-Pro iPhones to get a more natural look.

https://apps.apple.com/us/app/unpro-camera/id6535677796


I see that it says it stores images "in an HDR format by default" but keeps referencing JPEG output. Are you using JPEG-XT? There aren't a lot of "before and after" comparisons so it's hard to know how much it's taking out. I figure those would probably hurt the reputation of the app considering its purpose is to un-pop photos, but I'm in the boat of not really being sure whether I do actually like the pop or not. Is there live-photo support, or is that something that you shouldn't expect from a artist-focused product?


It's a JPEG + gain map format where the gain map is stored in the metadata. Same thing, as far as I can tell, that Halide is now using. It's what the industry is moving towards; it means that images display well on both SDR and HDR displays. I don't know what JPEG-XT is, aside from what I just skimmed on the Wikipedia page.

Not having before-and-after comparisons is mostly down to my being concerned about whether that would pass App Review; the guidelines indicate that the App Store images are supposed to be screenshots of the app, and I'm already pushing that rule with the example images for filters. I'm not sure a hubristic "here's how much better my photos are than Apple's" image would go over well. Maybe in my next update? I should at least have some comparisons on my website, but I've been bad at keeping that updated.

There's no Live Photo support, though I've been thinking about it. The reason is that my current iPhone 14 Pro Max does not support Live Photos while shooting in 48-megapixel mode; the capture process takes too long. I'd have to come up with a compromise such as only having video up to the moment of capture. That doesn't prevent me from implementing it for other iPhones/cameras/resolutions, but I don't like having features unevenly available.


I played some of this after I read the NESFab page posted about a week ago. It's an impressive NES game for any length of time spent on development, let alone a month. Now that I know that it's from the creator of NESFab, the polish makes sense -- obviously the creator is intimately familiar with both the hardware and their own development tools. Compliments must also be paid to the art and appropriately Sisyphean music.

I gave up at 35 souls.


> The NESFab page posted about a week ago.

For reference, here's the post: https://news.ycombinator.com/item?id=42999566


iPhones can do this. They support taking photos simultaneously from the two or three cameras on the back; the cameras are hardware-synchronized and automatically match their settings to provide similar outputs. The catch is you need a third-party app to access it, and you'll end up with two or three separate photos per shot which you'll have to manage yourself. You also won't get manual controls over white balance, focus, or ISO, and you can't shoot in RAW or ProRAW.

There are probably a good number of camera apps that support this mode; two I know of are ProCam 8 and Camera M.


block_dagger was making a pun based on the sense of drill as a training exercise. A similar joke went over the heads of nearly everyone on a recent episode of Taskmaster:

https://www.youtube.com/watch?v=6PJkA3o_Im0


English being the global language makes it easier to get a lift, but much harder for anyone to pick you up.


Articles about the merits of JPEG XL come up with some regularity on Hacker News, as if to ask, "why aren't we all using this yet?"

This one has a section on animation and cinemagraphs, saying that video formats like AV1 and HEVC are better suited, which makes sense. Here's my somewhat off-topic question: is there a video format that requires support for looping, like GIFs? GIF is a pretty shoddy format for video compared to a modern video codec, but if a GIF loops, you can expect it to loop seamlessly in any decent viewer.

With videos it seems you have to hope that the video player has an option to loop, and oftentimes there's a brief delay at the end of the video before playback resumes at the beginning. It would be nice if there were a video format that included seamless looping as part of the spec -- but as far as I can tell, there isn't one. Why not? Is it just assumed that anyone who wants looping video will configure their player to do it?


Besides looping, video players also deal kinda badly with low-framerate videos. Meanwhile, (AFAIK) GIFs can have arbitrary frame durations and it generally works fine.


> GIFs can have arbitrary frame durations and it generally works fine.

But we shouldn't be using animated GIFs in 2024.

The valid replacement for the animated GIF is an animated lossless compressed WebP. File sizes are are much more controlled and there is no generational loss when it propagates the internets as viral loop (if we all settled on it and did not recompress it in a lossy format).


Most modern video container formats support arbitrary frame durations, using a 'presentation timestamp' on each frame. After all, loads of things these days use streaming video, where you need to handle dropped frames gracefully.

Of course, not every video player supports them well. Which is kinda understandable, I can see how expecting 30 frames per second from a 30fps video would make things a lot simpler, and work right 99.9% of the time.


> With videos it seems you have to hope that the video player has an option to loop

<video playsinline muted loop> should be nearly as reliable as a GIF in that regard.

The one exception that I've found is that some devices will prevent videos from autoplaying if the user has their battery-saver on, leading to some frustrating bug reports.


How does GIF require support for looping as opposed to it being just a player implementation no different from any other format?


GIF format includes a flag inside it to indicate how many times (or forever) to loop the video.

HEVC does not have such a flag. Quite unfortunate


Interesting, it seems like this isn't part of the spec per wiki:

> Most browsers now recognize and support NAB, though it is not strictly part of the GIF89a specification.

So I guess the players for the new video codecs could do something similar and agree on some metadata to be used for the same purpose?


> Most browsers now recognize and support NAB

The “NAB” was introduced in Netscape Navigator 2.0, 50 million years ago.

The phrasing with “most browsers” and “now” is a bit weird, on part of whoever wrote that part of the Wikipedia article.

Every major browser that I know of has supported animated gifs since forever.

Any browser that doesn’t is probably either a non-graphical browser in the first place, or one that has like five people using it.


You're doing a similar phrasing weirdnesd as the wiki

> Every major browser

There are only 2 major ones

And this brings us back to the main point - this is no "format" issue, Chrome could just as well support some metadata field as "loop for n" for the newer video files, and the situation would be the same as with NAB when Safari adds it.


There are two major browser engine lineages but at least four of 'major' browsers (>5% market share) and a number of minor browsers.


There are three major browser engines. Chrome derivatives cannot be counted as a separate browser (yet, or ever? remains to be seen)


5% is minor, not major.


Come on, this is ridiculously pedantic. They used "major browser" to exclude WiP and hobby project to avoid pedants coming in and saying "uh Ladybird doesn't support looping gifs yet" or whatever. But I guess there's no pleasing the pedants.


You've missed both points:

- the 5% point you're responding to doesn't refer to the original "Every major browser"

- the original reponse just highlights that there is no non-pedantic difference between "most browsers" and "Every major browser", so that was the start of the anti-pedantism battle


No, this is ridiculous. The claim isn't about "most browsers" but the ones people actually use. You know, Firefox and those based on Chromium and WebKit. There could be 100 browser hobby projects out there which don't support NAB, "most browsers" would then not support NAB, but pretty much everyone would still be using a browser which supports NAB.

"The major browser engines" is a commonly used phrase to refer to Chromium, WebKit and Gecko (and formerly Trident and Presto). You're willfully misudnerstanding it. Please stop.

This will be my last response in this thread, this conversation is absolutely ridiculous.


> Come on, this is ridiculously pedantic. They used

"most browser" to include the major ones, it's a commonly used way of expressing it

> But I guess there's no pleasing the pedants.

who come up with imaginary examples like this:

> There could be 100 browser hobby projects


The same thing stood out to me. With the popularity of animated GIFs, it's disappointing and ridiculous for a new Web-friendly image format to omit at least a simple multi-image/looping facility.

As for your question about video looping: Nothing prevents that, although I don't know of a container format that has a flag to indicate that it should be looped. Players could eliminate the delay on looping by caching the first few frames.


While nothing like the photos in the article, I've gotten neat footage of living insects (and other arthropods) by putting them in a little makeshift container under my microscope, with strong lighting from the side. This is very much not a good setup, but it's allowed me to capture things like a front view of an earwig cleaning its antenna.

I typically do this when I find a little arthropod inside; instead of killing it, I give it a free trip outside without (intentional) harm, for the small price of experiencing an alien abduction.


I can just imagine the bug trying to tell his buddies that he was abducted and examined by a massive alien creature, but no one believes him. ;0)


That's really nice!


The Nintendo 3DS is presumably (with over 75 million units sold) the most popular camera that takes 3D photos in the MPO format. Unfortunately, the original 3DS's cameras are rather poor: absolutely dreadful dynamic range and tons of color noise. However, they did improve the cameras on the New Nintendo 3DS, which I've never owned. I've even considered making some homebrew to apply computational photography techniques on the 3DS to reduce noise and improve dynamic range, but I'm not at a point in my life where I can justify that right now.

I was looking at my old 3DS photos just recently, and there's not much software to read MPO files, so this project looks pretty darn cool and I'll be checking it out.

Something that I'm sure some people aren't aware of is that the 3DS's 640x480 photos don't match the resolution or aspect ratio of the 15:9 400x240 (800x240, but halved horizontally for 3D) screen, so the 3DS photo gallery actually shows photos zoomed in by default. If you didn't know this, now you can revisit your 3DS photos and see extra photo for free by pulling down on the circle pad.

Edit: I should mention - I did say that there's not much software that reads MPO files, but one program that does is StereoPhoto Maker. https://stereo.jpn.org/eng/stphmkr/index.html I haven't tried it out yet, but it supports aligning and batch-processing 3D images, among other features.


I've got a New Nintendo 3DS. I took a nice stereogram at the Vietnam War Memorial at the National Mall with it but that's about it.

My stereo production toolchain is based on PIL and PIL reads MPOs. An MPO is just two JPGs concatenated together so they aren't hard to read. My photog friends swear by Stereo Photo Maker but in my book it is "just another image processing program by people who don't understand gamma correction" but Adobe Photoshop is dangerously close to that category too.


I believe the mistake of thinking bird ankles are knees is the source of the name of the delightful webcomic False Knees.

https://falseknees.com

And I say “believe” because I haven’t found any explanation of the reason for the name, but I don’t know what else it would refer to.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: