Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think a better method of testing current generation of LLMs is to generate programs to play Poker.


(author of the PokerBattle here)

Depends on what your goal is, I think.

And it's also a thing — https://huskybench.com/


Great job on this btw. I don’t mean to take away anything from your work. I’ve also toyed with AI H2H quite a bit for my personal needs. It’s actually a challenging task because you have to have a good understanding of the models you’re plugging in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: