As a more general comment, the repo README provides examples that all use gpt2. It would be nice to see at least one example that invokes llama2, since I feel like that would make sure the reader knows that this library can use models that are more modern and interesting.
Inclined to disagree - gpt2 is far more likely to produce gibberish. So if you can force specific outputs on that then it is a good demo that higher quality models will be even better
Maybe... but then if I want to use something better, I have to figure out how by myself. I said "at least one example", not "please change all the examples to llama2." I agree with your general point. It would be nice if there were an example of how to use a better model.
Models often have different shapes and requirements, so is it really as simple as changing the string "gpt2" to "llama2-13B-Chat" and it will magically work? If so, that's great, and I wish that was made clear. Unfortunately, that hasn't always been my experience with other libraries.
Yes, any model that you can run on your computer. It changes the way that the tokens are sampled from the LLM, and OpenAI does not give you deep enough access into the pipeline to affect that.