Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Energy, not compute, will be the #1 bottleneck to AI progress – Mark Zuckerberg [video] (youtube.com)
35 points by ksec on May 12, 2024 | hide | past | favorite | 66 comments


I am really surprised by how tunnel visioned top scientists, VCs and CEOs are in the AI space.

This whole movement of simply increasing compute or throwing in more data to hopefully attain AI results is just a dead end.

This is like saying if human beings just flap their wings hard enough, they will be able to fly like birds.

We need to finance original research, alternative research and think of working on an actual theory of intelligence.

The stage AI is at right now is more like the Ptolemaic model of the universe.

We need a Galileo and Kepler of AI. Maybe they already exist but are being marginalized by the current establishment?


> This whole movement of simply increasing compute or throwing in more data to hopefully attain AI results is just a dead end.

There's no way for you or for anyone else to know that it's a dead end.

Also, it's not "simply throwing more compute" - the algorithms have evolved too and they'll continue to.

Also also, it's not "hopefully attain AI results" - we have AI results now and they keep getting better.

People are spending resources on a trajectory which has worked well - we've been cranking up the compute and better results keep coming out, why stop now?


All I am saying is that there are other strategies to explore.

It's ok to have deep learning. But the herding towards deep learning is just too much.

Recall that before Hinton and his group presented thier record breaking ImageNet results, deep learning was a fringe technique that the likes of Marvin Minsky regarded as a failure in "Perceptions"

Scientific progress is made by researchers exploring other alternatives.

Herding can only slow down progress.


> All I am saying is that there are other strategies to explore

The “other strategies” have been completely sidelined. Even back when we had no transformers, only CNNs and RNNs, the other strategies were tossed aside because they didn’t show much promise in cifar or mnist or whatever benchmarks the NN community deemed the most suitable sota. Even before that, when we only had decision trees and catboost, people had already dismissed loess/splines etc. You can get a flavor of the dismissals here -

https://news.ycombinator.com/item?id=19145706

At this point that horse has left the barn.


The imagenet results’ “main” contribution was showing that scaling the compute on large datasets worked surprisingly well. They didn’t contribute any new models as much as they showed that scaling is what helped these models learn and obtain SOTA.

It’s no surprise that industry has been pushing for more compute ever since.


> All I am saying is that there are other strategies to explore.

Are people not exploring other strategies?


Do you know anyone in AI doing research on something that isn't Neural Networks?

They are very few. Too few.


> Herding can only slow down progress.

You don't know that either. What if there are actually no alternatives.


There are solutions. The human brain is one example.

Isn't that what we are trying to simulate?



One must consider the concept of "diminishing returns" eventually.

Has Generative AI Already Peaked? - Computerphile (https://www.youtube.com/watch?v=dDUC-LqVrPU)


Moores law is the inverse


Moores law doesn’t help when you’re spending money today rather than 10 years from now.


So much is being learned, and some stock prices have gone up. If nothing else, the people receiving coding suggestions and AI-generated artworks appreciate the billions of dollars and million programmer-hours spent.


I used to have the same opinion for many years.

Then generative AI became good. Good enough for me to use it as a daily assistant. So good it scared me and led me to rethink my position.

I have missed one crucial thing: Big enough changes in quantity do lead new qualities as well.

Training models with insane amounts of data was the necessary step to finally make the qualitative leap into developing models that were practical for mainstream use.

One of the big things holding back AI research has always been trying to be too clever. This article sums it up perfectly:

> The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. http://www.incompleteideas.net/IncIdeas/BitterLesson.html


Thanks. This blog is very interesting


>We need a Galileo and Kepler of AI.

Let's imagine a modern day Galileo discover a new revolutionary algorithm that gives the same results for 1000x less compute.

How would he transform this competitive advantage into some money without revealing the secret sauce ? Today's he will be in competition with people spending $10^8 to train. Even with the 10^3 cost reduction it's out of range for most people. And then he has to build a business around while keeping making sure his secret techniques don't leak.

VCs and CEOs are just playing the controlling the business game, raising barrier to make sure they stay in control. Incremental advancements are great for them, because they get to stay in the lead while reducing the possibility that a revolutionary advancement comes to shuffle their game.

Renaissance Technologies could maintain an edge over the whole stock market for plenty of years. AI-controlled company will probably be able too, but you won't hear about them.


> We need to finance original research, alternative research and think of working on an actual theory of intelligence.

> We need a Galileo and Kepler of AI. Maybe they already exist but are being marginalized by the current establishment?

The problem with "theories of intelligence" is that they need to yield results. You can come up with whatever brilliant theory you like but at the end of the day you need to produce a tool, a model, a result, something that does something.

And that's exactly what "theories of intelligence" have failed to do. Your argument is like Chomsky whining in the NYT about machine learning. What did Chomsky's linguistics actually do? What did they enable us to build? The answer is probably not "nothing" -- it's more like "not much". Meanwhile, machine learning gave us a tool that can translate from one language to another reasonably well and didn't stop there.

Theory isn't useless but we have plenty of it. There's so much borderline useless theory about language and intelligence in academia and it has produced so little tangible value. We absolutely do not need more of it.


If it is needed soneone will and probably is currently doing original research. The payoff is too great that alternatives would be ignored. As long as financial incentives exist progress will inevitably happen and the financial incentives in this case is massive.


If it takes 100 billion to train and run an llm then only companies with 100 billion can train and run an llm. That's their moat.


Perhaps the more apt analogy is that we need "a John Von Neuman" to come and organise the pieces to give us that leap.


Given that population has increased and arguably(?) gotten more intelligent since Neuman's day (and for all the deserved respect he gets, I'm more of a Feynman fan myself so there were a few brainiacs back then), you'd have to assume there are a few people on his level walking around today, and I wonder what they're up to. Is creating a more addictive website for a FAANG company or a better high-speed trading algorithm for a Fintech what they see as a good calling? The smartest person I knew in high school (and I went to a rather high end one) who easily was the valedictorian, basically only wanted to play the trumpet (which he was also superlatively good at). I don't really know what became of him.


> Given that population has increased and arguably(?) gotten more intelligent since Neuman's day, you'd have to assume there are a few people on his level walking around today, and I wonder what they're up to.

Hahaha, oh man.

Von Neumann, and many of his contemporaries, weren't secretive researchers toiling in obscure corporate labs -- they were famous public intellectuals with renowned accomplishments across many fields. There are no men of that caliber alive today, and the educational system absolutely excels in ensuring that such well-rounded men are not produced.


Either way - if/when that breakthrough happens would you rather have n compute or n^4 compute ready to go?


Buying compute before such breakthroughs is a terrible investment compared to just waiting.


Just rent it out. It's not like we have an abundance of GPUs as of now.


It’s not like it’s sitting idle - the companies feel like they’re getting a ROI in whatever Gen AI nonsense they’re pushing to their apps


Same as it ever was.

SV runs 80/20 on investors getting fleeced vs. vaguely realistic business models.


AI research in 2024, both fundamental and applied, is probably the most funded research in the history of mankind..


Grant funders only finance projects with high probability of success.

It's very difficult to get funding on projects that don't already have a track record of success.

Which is why all research facilities working in AI today that I have read about only do research on Neural Networks


Do you have any citation on lack of funding for foundational research? From the looks of things, it seems like there is insane amount of money in AI and while maybe majority of it goes to "projects with high probability of success", whatever that means, fringe ideas seem to generate good amount of capital as well. Even (what I would consider to be) academic projects seems to get good funding [0]. On top of that, it seems like you're not saying that Transformers might be a dead end but Neural Networks itself might be one. At this point, NNs already gives a lot of value. If someone's goal is to just get products out that say diffusion models already helps you achieve, how is that even a dead end? Do you mean dead end for AGI? In which case, there seems to be research on alternatives there as well? [1]

[0] https://www.youtube.com/watch?v=rie-9AEhYdY

[1] https://ai.meta.com/blog/yann-lecun-ai-model-i-jepa


I don't have any citations.

Just look at what is being published in Arxiv.

Call your friends in university departments.

Ask people working in startups or corporation.

AI at this point is just Neural Networks.

Do I think it's a dead end for AGI?

I really don't know what AGI is, but for what ever new possibilities we can unlock.

We need newer techniques that are less intensive computationally.


I am an outsider to the space, but I feel this sentiment.

It appears that a lot of breakthroughs came from further understanding how the brain works but we are just trying to brute-force our way to do harder things.

(And our brains also do things that AIs cannot even when running on a few salads a day.)


> This whole movement of simply increasing compute or throwing in more data to hopefully attain AI results is just a dead end.

Hopefully? The last 10 years of astonishing AI results have largely been driven by “throwing in more compute and data”.


> This whole movement of simply increasing compute or throwing in more data to hopefully attain AI results is just a dead end.

It’s the logical incremental step until an actual breakthrough is model on technique


transformers don’t show any sign of convergence yet. The more data and compute we throw at them, weather it be chess, go, image classification, text prediction, the better they get. Tunnel vision is simply focus on finding a horizon, or a limit of some kind.


How can you be so confident when nobody has managed to convincingly beat GPT-4 yet, including Open AI themselves?

All evidence is the throwing even textbook qualify data at a model of almost any size just approaches an asymptote just a tiny bit above GPT 4.

A better model of some data starts to look increasingly like then data, not like something else beyond the data.


i don't know about the future. I am only saying, we haven't seen diminishing returns from training transformers on more data / compute yet.


I mean, this sounds really wise and all, but it would have sounded equally wise 3 years ago and yet someone making predictions based on this reasoning would have utterly failed to predict Dall-E, ChatGPT or Whisper.


Sam Altman has in the past talked about gatekeeping AI compute to focus on important problems: things like solving for vaccines or intense science or even just things that will fundamentally improve humans day to day vs asking it dumb questions that could be found with a google search.


Can anyone here argue with confidence (read: evidence) that this AI boom is, in fact, yielding real-life, non-superfluous evolutionary advantages to mankind in general? Judging from a primitive metric, I see no one around me benefiting from AI, nor the people closest to them. Considering the current scale and volume of investments in the field, I would expect that finding AI success stories from a random sample within metropolitan communities should be common by now. That seems not to be the case from my point of view. How can anyone argue that investing in AI indeed yields superior results over merely distributing all the resources diversely, when we have much stronger evidence of advantageous outcomes for other domains? I understand this looks like a false duality, but it may not be considering the established, narrow, and concentrated investment structures. Overall, software has a really flimsy connection to actual human necessities, being more of a vector than a need in itself. But AI seems to be all about itself for now. Recall, at least, that AI winters are about a steep decay in public confidence over the promised outcomes, rather than necessarily a decline in research progress and investments. So the more the investment rates outpace the actual outcomes, the higher the probability of it happening. Ironically, I revised this text with GPT, but this is as far as it’s been useful to me. It’s just marginally more useful than what we actually have, and substantially more expensive.


Deepmind has done some interesting stuff with proteins.


Another quote from the full interview: "I guess a big part of my theory on this is that there's not just going to be like one singular AI that you interact with because I think every business is going to want an AI that represents their interests. They're not going to want to primarily interact with you through an AI that is going to sell their competitors' customers."

A huge Freudian slip there.


What's the slip? That AI in practice will not be neutral but instead tailored to the preferences of the vendor?


The slip is that they aren’t developing AI for the betterment of human kind, just to better sell to you —- so you keep buying more crap. At least that was my hot take.


No, the slip is that Mark subconsciously equates customers and products, hence he said "sell their competitors' customers" instead of "sell their competitors' products".


I hope AI doesn't become a version of Universal Paperclips.


We need to bring models closer to animal brains; more research, better models. Not just keep adding gpus as fast as we can. Everyone knows this obviously, however seems we are going to push the limits even though we know we need something vastly better in the basics.


If we assume this is true, then it's a massive problem for AI. Building energy infra is slow and hellishly expensive. Near and medium term use cases just aren't compelling enough to start upgrading the power grid.


If you look at the arguments for/against resource consumption for bitcoin, a lot of the same arguments hold true- that actually we don’t necessarily have a deficit of power production. We have an energy capacity problem and distribution problem.

I wonder how much of this over supply can be taken up with AI training and how much that would change the calculations.


It's not really all that useful for AI training. The reason why energy is a problem for AI training is that you need a lot of compute at one place, distributing backpropagation doesn't work that well. When the energy problems hit, it's when the datacenter need exceeds the capacity of nearby power plants.

If we want an alternative I think we'd need to try alternative training algorithms that are easier to distribute. Predictive coding for example can approximate backprop, but requires more compute. Something in that direction might work https://arxiv.org/abs/2006.04182


I wonder how feasible it would be to place a data center in somewhere like Arizona or Nevada and just run it during daytime hours on a massive solar array and then pause the work for the night time. That would probably take a few decades less than trying to permit and build a nuclear plant next to it


You are right that nuclear power tends to have a very long lead time. China has already conveniently installed vast amounts of solar power.


AI compute is not super latency sensitive. Surely you can just stick the DC somewhere that has power? A bit like the Chinese crypto miners moved to hydro dams


Crypto (Proof of Work) has the advantage of being extremely low bandwidth, very static compute. You just stick boxes somewhere, plug them in and they mostly just work.

AI requires high bandwidth and dynamic compute. You need a whole supply chain to a data center. Repairs become expensive when you have to ship in highly trained people to fix the machines. Swapping out an OAM/UBB AIA isn't like pulling a gpu card from a PCIe slot.


Those all sound like surmountable problems to me. At least compared to the starting premise that we just don’t have enough energy


What if AI took over bitcoin's power usage?


What's static/dynamic compute in this context?


Bitcoin ASIC's are basically nothing more than kitchen appliances at this point. Plug and play. Everything can be replaced with a few screws. They are extremely static. Put it on a shelf, plug in power/network, and run it until it dies.

AIA servers are complex 350lbs beasts that have a lot of moving components and require dedicated trained technicians to replace parts. They have multiple ways they connect to multiple networks. They require a lot of software. They are very dynamic with continuous updates.


Actually, he says energy is the current bottleneck that if overcome, could lead to another bottleneck in the rate of scaling AIs capabilities


If true, it could be big for countries which already are or are trying to become strong on nuclear. Q There's no reason to suspect most training runs could only be run e.g. during the workday. So a 24/7, more or less constant supply of electricity, like what nuclear characteristically provides, is something training centers could really slurp up.


Photonics will make energy much less relevant again


Energy is a bottleneck to life in general, and the whole humanity.

It's like saying your employees need oxygen to breath or they can't work at scale.


There is a very intelligent system, far more intelligent than GPT-4, which uses a tiny fraction of the energy. It's called a "rat," and despite requiring only 2 watts of power, it can perform the following tasks which are far beyond any multimodal LLM:

- creating and executing complex plans in arbitrary scenarios

- counting to 6 with 99% reliability in arbitrary scenarios

- understanding object permanence with 99% reliability

- understanding simple, intuitive causality with 99% reliability

But rats can't answer Python questions, so according to a truly depressing number of tech folks, of course GPT-4 is smarter than a rat! AI researchers/capitalists seriously need to spend more time reading animal cognition. If Silicon Valley was in charge of animal cognition research, spiders would be considered smarter than crows.


How much energy was spent on the training run though? It's all the energy spent through the lifetime of trillions of organisms.


You can always slam more proccessors together, makes sense compute isnt the bottleneck. You have people who have proven you can run on CPU, even if its not realistic in the real-world.

Memory currently is a concern.

Algorithms are my only concern. Its not like these algorithms can snag some protons and electrons and analyze them, they are at the whim of human progress/data collection.

Energy, above my paygrade.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: