Why are there so few 32,64,128,256,512 GB models which could run on current cons...

eldenring · 2025-12-02T01:59:08 1764640748

the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.

cowpig · 2025-12-02T02:06:23 1764641183

More and more people I talk to care about privacy, but not in SF

mistercheph · 2025-12-02T16:46:33 1764693993

and sovereignty. I can go into the woods with a fuzzy approximation of all internet text in my backpack

jameslk · 2025-12-02T00:04:19 1764633859

128 GB should be enough for anybody (just kidding). I hope the M5 Max will have higher RAM limits

aryonoco · 2025-12-02T00:19:31 1764634771

M5 Max probably won’t, but M5 Ultra probably will

ainch · 2025-12-02T05:48:49 1764654529

As LLMs are productionised/commodified they're incorporating changes which are enthusiast-unfriendly. Small dense models are great for enthusiasts running inference locally, but for parallel batched inference MoE models are much more efficient.