Once you can run a GPT5 level LLM locally on a device, it’s over. All this mighty infrastructure is no longer any more impressive than a top of the line 2013 Mac Pro in 2025. I think we’re 10 years away from that.
I doubt it. Newer state of the art models might be a little better, but not enough to justify paying $1000/month for the average person or employee.
If you can get a GPT5 level AI, locally and privately, for just the cost of electricity, why would you bother with anything else? If it can’t do something, you’d just outsource that one prompt to a cloud based AI.
The vast majority of your prompts will be passing through a local LLM first in 2035, and you might rarely need to touch an agent API. So what does that mean for the AI industry?
Consumer devices are already available that offer 128gb specifically labeled for AI use. I think server side AI will still exist for IoT devices, but I agree, 10 years seems pretty reasonable timelie to buy a GTX 5080-sized card that will have 1TB of memory, with the ability to pair it with another one for 2TB. For local, non-distributed use, GPUs are already more than capable of doing 20+ tokens/s, we're mostly waiting on 512gb devices to drop in price, and "free" LLMs to get better.
My guess was that nvidia is limiting memory size on consumer cards to avoid cannibalizing their commercial/industrial sales. I see no reason why a 5060 or 5070 can't come with 64/128/512gb memory outside of intentional decisions to not support those memory sizes; I don't need a 5090 as I don't need more than ~20-40 tokens/s as a 1-4 user household system