The reason they are called "world models" is because the internal representation of what they display represents a "world" instead of a video frame or image. The model needs to "understand" geometry and physics to output a video.
Just because there are errors in this doesn't mean it isn't significant. If a machine learning model understands how physical objects interact with each other that is very useful.
Correct. The fact that AI is a black box means we can easily imagine anything we want happening within that box. Or perhaps the more accurate way to say it - AI companies can convince investors of amazing magic happening within that box. With LLMs, we anthropomorphize and imagine it’s thinking. With video models, they’re now trying to convince us that it understands the world. None of these things are true. It’s all an illusion.
Just because there are errors in this doesn't mean it isn't significant. If a machine learning model understands how physical objects interact with each other that is very useful.