I would agree, but on the other hand, Transformer (or attention really) based models seem to be the first time that computers are generating ad hoc human text on a variety of topics, so I do believe the hype is justified. I mean... people have spent entire careers in pursuit of this goal, and it's here... as long as what you want to talk about is less than 4096 / some K tokens.
Given how little progress (relatively) was made until transformers, it seems totally reasonable to pursue att ention models.
Interesting: I do wonder about slightly more complex languages which have declensions and verb "gender" (eg. in Serbian "pevala" means "(a female) sang", whereas "pevao" means that a male did. Or nouns and adjectives can be in 7 declensions: "plavom olovkom" means "with a blue pen", whereas "a blue pen" is just "plava olovka".
ChatGPT always mixes these up, hallucinates a bunch of words (inappropriate prefixes, declensions etc and is very happy to explain the meaning of these imaginary words), and I can imagine smaller, more complex languages like Serbian needing even larger corpuses than English, yet that's exactly the hard part: there is simply less content to go off of.
Given how little progress (relatively) was made until transformers, it seems totally reasonable to pursue att ention models.