On my M4 Max MacBook Pro, with MLX, I get around 70-100 tokens/sec for Qwen 3 30B-A3B (depending on context size), and around 40-50 tokens/sec for Qwen 3 14B. Of course they’re not as good as the latest big models (open or closed), but they’re still pretty decent for STEM tasks, and reasonably fast for me.
I have 128 GB RAM on my laptop, and regularly run multiple multiple VMs and several heavy applications and many browser tabs alongside LLMs like Qwen 3 30B-A3B.
Of course there’s room for hardware to get better, but the Apple M4 Max is a pretty good platform running local LLMs performantly on a laptop.
I have 128 GB RAM on my laptop, and regularly run multiple multiple VMs and several heavy applications and many browser tabs alongside LLMs like Qwen 3 30B-A3B.
Of course there’s room for hardware to get better, but the Apple M4 Max is a pretty good platform running local LLMs performantly on a laptop.