I think a better method of testing current generation of LLMs is to generate pro...

mpavlov · 2025-10-28T11:35:20 1761651320

(author of the PokerBattle here)

Depends on what your goal is, I think.

And it's also a thing — https://huskybench.com/

lvl155 · 2025-10-28T14:38:41 1761662321

Great job on this btw. I don’t mean to take away anything from your work. I’ve also toyed with AI H2H quite a bit for my personal needs. It’s actually a challenging task because you have to have a good understanding of the models you’re plugging in.