What is the recommended hardware? GPU required? Could this run OK on an older Ry...

coder543 · on Jan 12, 2025

The usual bottleneck for self-hosted LLMs is memory bandwidth. It doesn't really matter if there are integrated graphics or not... the models will run at the same (very slow) speed on CPU-only. Macs are only decent for LLMs because Apple has given Apple Silicon unusually high memory bandwidth, but they're still nowhere near as fast as a high-end GPU with extremely fast VRAM.

For extremely tiny models like you would use for tab completion, even an old AMD CPU is probably going to do okay.

mjrpes · on Jan 12, 2025

Good to know. It also looks like you can host TabbyML as an on-premise server with docker and serve requests over a private network. Interesting to think that a self-hosted GPU server might become a thing.

wsxiaoys · on Jan 12, 2025

Check https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ to see a local setup with 3090.

mkl · on Jan 13, 2025

That thread doesn't seem to mention hardware. It would be really helpful to just put hardware requirements in the GitHub README.