Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That looks like something that can be solved by autoregressive models of today, no architectural changes needed.

What you need is: good image understanding, at least GPT-5 tier, general purpose reasoning over images training, and then some domain-specific training, or at least some few-shot guidance to get it to adopt the correct reasoning patterns.

If I had to guess which model would be able to do it best out of the box, few-shot, I'd say Gemini 3 Pro.

There is nothing preventing an autoregressive LLM from revisiting images and rewriting the texts as new clues come in. This is how they can solve puzzles like sudoku.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: