ELO takes a while to establish. It does not sound likely that the newer GPT3.5 i...

orbital-decay · on Dec 11, 2023

> ELO takes a while to establish.

Well, Starling-7B was published two weeks ago; GPT-3.5-turbo-0613 is more than a month old snapshot, which should probably be enough time. OpenChat and OpenHermes are about a month old as well.

>It does not sound likely that the newer GPT3.5 is that much worse than the old one

In fact, this version received complaints almost immediately. https://community.openai.com/t/496732

>In the immediate test, GPT-3.5 clearly outshines these models.

It might be so, but it's not clear to me at all. I tested Starling for a bit and was really surprised that it's a 7B model, not a 70B+ one or GPT-3.5.

whimsicalism · on Dec 11, 2023

I disagree - lmsys score for new chatgpt has been relatively constant, and OAI is probably trying to distill the model even further.