This is the reason why they're not going to move on device anytime soon. You can use compression techniques, sure, but you're not going to get anywhere near the level of performance of GPT-4 at a size that can fit on most consumer devices
I think we’ll see completely new architectures dominate in the near future, ousting the transformer. I am strongly suspicious that, while impressive, transformers use several orders of magnitude more compute than is “needed” for the tasks they perform—if for no other reason because the human brain performs similarly and it only draws 20 watts! And it isn’t even an engineered system, jus the product of a very, very long history of natural selection! I fully anticipate that we’ll see AI in the near future that achieves human-level performance on sub-human power budgets like the ones you’d be constrained by on a phone :)
"neat future" is very ambiguous. At the moment there is nothing even close to transformers in terms of performance. I suspect you are right in general but I'm not sure about the "near future" part, there needs to be a pretty significant paradigm shift for that to happen (which is possible, of course, I just don't see any hints of it yet).
RWKV is an attention-free architecture that's showing promising scaling at a similar level to Transformers right now! There's also recently been Hyena, which uses a new mechanism that's kind of a weird mix of attention, convolution, and implicit modelling all at once. It's shown promise as well. Remains to be seen if these competing methods will truly scale as well as Transformers, but I've got my fingers crossed. Only a matter of time!
I agree that "near future" is quite ambiguous though. If I were to disambiguate my claims, I think I'd personally expect a Transformer-killing architecture to arise in the next 4-5 years.