I actually tried the 4 bit quants (Q4_K_M) and was a bit unimpressed. Switching ...

syntaxing · on Aug 26, 2023

I have a hunch something is broken with the GGUF. I had terrible results using llama cpp as well.

larodi · on Aug 26, 2023

A lot of things are being getting right if you look at the issues at ggerganov’s repo.

To say anything as general as ‘the new file format is broken’ just means you either don’t understand the project basics or do not follow closely the commits.

syntaxing · on Aug 26, 2023

So? Doesn’t mean that the moment we are using it, the format isn’t broken. I didn’t say that it wouldn’t be fixed in the future. The reality is, the current 4 bit GGUF are giving us subpar results compared to other quantization method. It’s not a helpful comment telling me that “I don’t understand the basic” rather than telling me the exact flags we should using or it’s being fixed.

larodi · on Sept 2, 2023

Inference with llama.cop is not trivial and I can’t summarise in one post all of them parameters. What I’m saying is that in my opinion. is wrong to assume that changing from one transport to the other is causing degradation.

Llamacpp underwent some major changes last few weeks. And following the commits it took few days to stabilise. Try now , works as bliss. And compared to other inference engines such as tinygrad - is much more versatile in options how to be run.