https://github.com/ggerganov/llama.cpp/issues/34
An M1 Max does 100ms per token. A 64 core threadripper about 33ms per token.
https://github.com/ggerganov/llama.cpp/issues/34
An M1 Max does 100ms per token. A 64 core threadripper about 33ms per token.