Another take on a similar idea from FAIR is a Large Concept Model: https://arxiv...

eightysixfour · on Feb 18, 2025

Sort of, yes. I think one of our next unlocks will be some kind of system which predicts at multiple levels in latent space. Something like predicting the paragraph, then the sentences in the paragraph, then the words in the sentences, where the higher level "decisions" are a constraint that guides the lower level generation.

In meta's byte-level model they made the tokens variable length based on how predictable the bytes were to a smaller model, allocating compute resources based on entropy.