Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why are there so few 32,64,128,256,512 GB models which could run on current consumer hardware? And why is the maximum RAM on Mac studio M4 128 GB??




the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.

More and more people I talk to care about privacy, but not in SF

and sovereignty. I can go into the woods with a fuzzy approximation of all internet text in my backpack

128 GB should be enough for anybody (just kidding). I hope the M5 Max will have higher RAM limits

M5 Max probably won’t, but M5 Ultra probably will

As LLMs are productionised/commodified they're incorporating changes which are enthusiast-unfriendly. Small dense models are great for enthusiasts running inference locally, but for parallel batched inference MoE models are much more efficient.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: