I don't think this is all that well documented anywhere. I've had this problem t...

I don't think this is all that well documented anywhere. I've had this problem too and I don't think anyone has tried to record something like a decent benchmark of token inference/speed for a few different models. I'm going to start doing it while playing around with settings a bit. Here's some results on my (big!) M4 Mac Pro with Gemma 3, I'm still downloading Qwen3 but will update when it lands.

https://gist.github.com/estsauver/a70c929398479f3166f3d69bce...

Here's a video of the second config run I ran so you can see both all of the parameters as I have them configured and a qualitative experience.

https://screen.studio/share/4VUt6r1c