Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ELO takes a while to establish. It does not sound likely that the newer GPT3.5 is that much worse than the old one that has a clear gap to all the non proprietary models. In the immediate test, GPT-3.5 clearly outshines these models.


> ELO takes a while to establish.

Well, Starling-7B was published two weeks ago; GPT-3.5-turbo-0613 is more than a month old snapshot, which should probably be enough time. OpenChat and OpenHermes are about a month old as well.

>It does not sound likely that the newer GPT3.5 is that much worse than the old one

In fact, this version received complaints almost immediately. https://community.openai.com/t/496732

>In the immediate test, GPT-3.5 clearly outshines these models.

It might be so, but it's not clear to me at all. I tested Starling for a bit and was really surprised that it's a 7B model, not a 70B+ one or GPT-3.5.


I disagree - lmsys score for new chatgpt has been relatively constant, and OAI is probably trying to distill the model even further.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: