Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know this is a bit tangential (awesome work OP), but has anyone been able to get usable, consistent results from this thing? I've been playing around with the 13B model with llama.ccp, and while I do sometimes get good results, it often just gives me weird, repetitive nonsense.

I know it hasn't been fine tuned on instructions or had RLHF like ChatGPT, but has anyone figured out how to kinda work around it and actually use it in the way you can ask ChatGPT a question and typically get something coherent and useful out of it?



I've been playing around with the 30B version all day. The biggest improvement I've seen have come from changing the way I prompt (strike a more in medias res style, the model really likes continuing and gets confused if you give it a blank slate), and implementing top_k sampling (also discard the top_p=0 nonsense, you want top_p>1.0 to turn it off). It's important to note that the llama.cpp project does NOT implement top_k, even if you set that commandline parameter.


top_k is now implemented


We should be working on benchmarking this kind of tool. Instead of saying "this version/implementation gives interesting results sometimes", we should get some kind of score out of it (like the score of a test). Then we can better compare different versions and also test if the version we just installed is actually working as it should.


As others have said, you're supposed to start your text as if you are answering your own requests and the model will complete the text for you.


I just explained one solution on Twitter: https://twitter.com/LalwaniVikas/status/1635035951654387712




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: