This really ought to mention https://github.com/oobabooga/text-generation-webui, which was the first popular UI for LLaMA, and remains one for anyone who runs it on GPU. It is also where GPTQ 4-bit quantization was first enabled in a LLaMA-based chatbot; llama.cpp picked it up later.