Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Gemini 2.5: The First LLM That Understands PDF Layouts (sergey.fyi)
16 points by serjester 10 months ago | hide | past | favorite | 1 comment


This example is using bounding boxes, but it turns out Gemini 2.5 (both Pro and Flash) take that a step further and can return complex shaped segmentation masks identifying objects too: https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: