Data-efficiency matters, but compute-efficiency matters too.
LLMs have a reasonable learning rate at inference time (in-context learning is powerful), but a very poor learning rate in pretraining. And one issue with that is that we have an awful lot of cheap data to pretrain those LLMs with.
We don't know how much compute human brain uses to do what it does. And if we could pretrain with the same data-efficiency as humans, but at the cost of using x10000 the compute for it?
It would be impossible to justify doing that for all but the most expensive, hard-to-come-by gold-plated datasets - ones that are actually worth squeezing every drop of performance gains out from.
Energy is even weirder. Global electricity supply is about 3 TW/8 billion people, 375 W/person, vs the 100-124 W/person of our metabolism. Given how much cheaper electricity is than food, AI can be much worse Joules for the same outcome, while still being good enough to get all the electricity.
LLMs have a reasonable learning rate at inference time (in-context learning is powerful), but a very poor learning rate in pretraining. And one issue with that is that we have an awful lot of cheap data to pretrain those LLMs with.
We don't know how much compute human brain uses to do what it does. And if we could pretrain with the same data-efficiency as humans, but at the cost of using x10000 the compute for it?
It would be impossible to justify doing that for all but the most expensive, hard-to-come-by gold-plated datasets - ones that are actually worth squeezing every drop of performance gains out from.