> On a personal level, their model is getting beat handily by Claude Sonnet 3.5 ...

> On a personal level, their model is getting beat handily by Claude Sonnet 3.5 right now. It doesn't seem to show in the benchmarks. I wonder why?

I do use Sonnet 3.5 personally, but this "beat handily" doesn't show on LLM arena. Do OpenAI game that too?