Benchmarks for what you can do on CPU alone. https://github.com/ggerganov/llama....

		kiratp on May 5, 2023 \| parent \| context \| favorite \| on: Google “We have no moat, and neither does OpenAI” Benchmarks for what you can do on CPU alone. https://github.com/ggerganov/llama.cpp/issues/34 An M1 Max does 100ms per token. A 64 core threadripper about 33ms per token.