You can also correctly bias your sampling so that when selecting new training instances each language is chosen equally. Generally the diversity of data is good, unless that data is "wrong" which, ironically, is probably most of the internet, but I digress.
Ideal architecture would be the one you can patent.
Imagine if transformer architecture was patented. Imagine how much innovation patent system would generate - because that’s why it exists in the first place, right?
It’s not patented and you see how much harm it creates? Nobody knows about it, AI winter is in full swing.
Reasoning model doesn't imply tool calling – those shouldn't be conflated.
Reasoning just means more implicit chain-of-thought. It can be emulated by non reasoning model by explicitly constructing prompt to perform longer step by step thought process. With reasoning models it just happens implicitly, some models allow for control over reasoning effort with special tokens. Those models are simply fine tuned to do it themselves without explicit dialogue from the user.
Tool calling happens primarily on the client side. Research/web access mode etc made available by some providers (based on tool calling that they handle themselves) is not a property of a model, can be enabled on any model.
Nothing plateaued from where I'm standing – new models are being trained, releases happen frequently with impressive integration speed. New models outperform previous ones. Models gain multi modality etc.
Regarding alternative architectures – there are new ones proposed all the time. It's not easy to verify all of them at scale. Some ideas that are extending current state of art architectures end up in frontier models - but it takes time to train so lag does exist. There are also a lot of improvements that are hidden from public by commercial companies.
Worth noting that "cognitive complexity" may mean SonarQube metric – a metric that is not widely recognized by industry, created by SonarQube employee as (imho failed) attempt to address "issues" with "cyclomatic complexity" (principled, intdustry recognised metric). It'll count things like nullish coalescing and optional chaning as +1 on your complexity which makes it unusable with jsx or ts code.
With low thresholds ("clean code" like low) both LoC and "cognitive complexity" (as in SQ) are bad measures that lit up in red large percentage of otherwise correct code in complex projects. The way people usually "solve" them is through naive copy pasting which doesn't reduce any cognitive load - it just scatters complexity around making it harder to reason about and modify in the future.
I had been thinking about things along those lines. I might do that as a separate site. Perhaps with regular resets, and with something to keep 4chan out.
reply