Haha, I like to joke that we were on track for the singularity in 2024, but it stalled because the research time gap between "profitable" and "recursive self-improvement" was just a bit too long that we're now stranded on the transformer model for the next two decades until every last cent has been extracted from it.
There's massive hardware and energy infra built out going on. None of that is specialized to run only transformers at this point, so wouldn't that create a huge incentive to find newer and better architectures to get the most out of all this hardware and energy infra?
Only being able to run transformers is a silly concept, because attention consists of two matrix multiplications, which are the standard operation in feed forward and convolutional layers. Basically, you get transformers for free.