Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm no more trusting the benchmarks. other than trying it out myself, what else can we do here?


It's already been done (ELO, see LMSYS rankings). I hope we're cresting past the 50% percentile mark of people who haven't heard of it.


I see. thanks for the reference. followed it on x now.

https://twitter.com/lmsysorg/status/1772759835714728217




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: