One might also imagine that as one of the "godfathers of AI" he feels a bit sidelined by the success of LLMs (especially given above), and wants to project an image of visionary ahead of the pack.
I actually agree with him that if the goal is AGI and full animal intelligence then LLMs are not really the right path (although a very useful validation of the power of prediction). We really need much greater agency (even if only in a virtual world), online learning, innate drives, prediction applied to sensory inputs and motor outputs, etc.
Still, V-JEPA is nothing more than a pre-trained transformer applied to vision (predicting latent visual representations rather than text tokens), so it is just a validation of the power of transformers, rather than being any kind of architectural advance.