Moore is not helping here. Software and algorithms will fix this up, which is already happening at a frightening rate. Not too long ago, like months, we were still debating if it was ever even possible to run LLMs locally.
There is going to be a computational complexity floor on where this can go, just from a Kolmogorov complexity argument. Very hard to tell how far away the floor is exactly but things are going so fast now I suspect we'll see diminishing returns in a few months as we asymptote towards some sort of efficiency boundary and the easy wins all get hoovered up.
Yes indeed and it’ll be interesting to see where that line is.
I still think there is a lot to be gained from just properly and efficiently composing the parts we already have (like how the community handled stable diffusion) and exposing them in an accessible manner. I think that’ll take years even if the low hanging algorithm fruits start thinning out.
This is very true, however there is a long way to go in terms of chip design specific to DL architectures. I’m sure we’ll see lots of players release chips that are an order of magnitude more efficient for certain model types, but still fabricated on the same process node.
Moore's law isn't dead. Only Dennard's law. See slide 13 here[0] (2021). Moore's law stated that the number of transistors per area will double every n months. That's still happening. Besides, neither Moore's law nor Dennard scaling are even the most critical scaling law to be concerned about...
...that's probably Koomey's law[1][3], which looks well on track to hold for the rest of our careers. But eventually as computing approaches the Landauer limit[2] it must asymptotically level off as well. Probably starting around year 2050. Then we'll need to actually start "doing more with less" and minimizing the number of computations done for specific tasks. That will begin a very very productive time for custom silicon that is very task-specialized and low-level algorithmic optimization.
[0] Shows that Moore's law (green line) is expected to start leveling off soon, but it has not yet slowed down. It also shows Koomey's law (orange line) holding indefinitely. Fun fact, if Koomey's law holds, we'll have exaflop power in <20W in about 20 years. That's equivalent to a whole OpenAI/DeepMind-worth of power in every smartphone.
The neural engine in the A16 bionic on the latest iPhones can perform 17 TOPS. The A100 is about 1250 TOPS. Both these performance metrics are very subject to how you measure them, and I'm absolutely not sure I'm comparing apples to bananas properly. However, we'd expect the iPhone has reached its maximum thermal load. So without increasing power use, it should match the A100 in about 6 to 7 doublings, which would be about 11 years. In 20 years the iPhone would be expected to reach the performance of approximately 1000 A100's.
At which point anyone will be able to train a GPT-4 in their pocket in a matter of days.
There's some argument to be made that Koomey himself declared in 2016 that his law was dead[4], but that was during a particularly "slump-y" era of semiconductor manufacturing. IMHO, the 2016 analysis misses the A11 Bionic through A16 Bionic and M1 and M2 processors -- which instantly blew way past their competitors, breaking the temporary slump around 2016 and reverting us back to the mean slope. Mainly note that now they're analyzing only "supercomputers" and honestly that arena has changed, where quite a bit of the HPC work has moved to the cloud [e.g. Graviton] (not all of it, but a lot), and I don't think they're analyzing TPU pods, which also probably have far better TOPS/watt than traditional supercomputers like the ones on top500.org.