Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The reason they are called "world models" is because the internal representation of what they display represents a "world" instead of a video frame or image. The model needs to "understand" geometry and physics to output a video.

Just because there are errors in this doesn't mean it isn't significant. If a machine learning model understands how physical objects interact with each other that is very useful.



  > what they display represents a "world" instead of a video frame or image.
Do they?

I'm unconvinced. The tiger and girl video is the clearest example. Nothing about that seems world representing


I think the reason is "those words look nice on promo material". It is absolutely build to trigger hype from the clueless


> The model needs to "understand" geometry and physics to output a video.

No it doesn't. It merely needs to mimic.


Correct. The fact that AI is a black box means we can easily imagine anything we want happening within that box. Or perhaps the more accurate way to say it - AI companies can convince investors of amazing magic happening within that box. With LLMs, we anthropomorphize and imagine it’s thinking. With video models, they’re now trying to convince us that it understands the world. None of these things are true. It’s all an illusion.


It's worse than that. It's not a black box. We know how the architecture is constructed. We can read the weights.


Here's a recent paper showing that models trained to generate videos develop strong geometric representations and understanding:

https://arxiv.org/abs/2512.19949




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: