From my experience llama.cpp doesn’t take full advantage of parallelism as it co... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		sodality2 on Aug 26, 2023 \| parent \| context \| favorite \| on: Beating GPT-4 on HumanEval with a fine-tuned CodeL... From my experience llama.cpp doesn’t take full advantage of parallelism as it could. Tested this on an HPC cluster - increasing thread count certainly did increase CPU usage but did not meaningfully improve tok/s past 6-8 cores. Same behavior with whisper.cpp. :( I wonder if there’s another backend that scales better.

moonchrome on Aug 26, 2023 [–]

I'm guessing the problem is that you're constrained by memory bandwidth and not computation power and this is inherent to the algorithm, not an artifact of any one implementation.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact