I know this is a bit tangential (awesome work OP), but has anyone been able to g...

delusional · on March 12, 2023

I've been playing around with the 30B version all day. The biggest improvement I've seen have come from changing the way I prompt (strike a more in medias res style, the model really likes continuing and gets confused if you give it a blank slate), and implementing top_k sampling (also discard the top_p=0 nonsense, you want top_p>1.0 to turn it off). It's important to note that the llama.cpp project does NOT implement top_k, even if you set that commandline parameter.

BinRoo · on March 13, 2023

top_k is now implemented

amelius · on March 12, 2023

We should be working on benchmarking this kind of tool. Instead of saying "this version/implementation gives interesting results sometimes", we should get some kind of score out of it (like the score of a test). Then we can better compare different versions and also test if the version we just installed is actually working as it should.

imtringued · on March 12, 2023

As others have said, you're supposed to start your text as if you are answering your own requests and the model will complete the text for you.

lalwanivikas · on March 12, 2023

I just explained one solution on Twitter: https://twitter.com/LalwaniVikas/status/1635035951654387712