I'm not working on game-related topics lately, I'm in the industry now (algo-tra...

I'm not working on game-related topics lately, I'm in the industry now (algo-trading) and also little bit out of touch.

> Has there been any meaningful progress after that?

There are attempts [0] at making the algorithms work for exponentially large beliefs (=ranges). In poker, these are constant-sized (players receive 2 cards in the beginning), which is not the case in most games. In many games you repeatedly draw cards from a deck and the number of histories/infosets grows exponentially. But nothing works well for search yet, and it is still open problem. For just policy learning without search, RNAD [2] works okayish from what I heard, but it is finicky with hyperparameters to get it to converge.

Most of the research I saw is concerned about making regret minimization more efficient, most notably Predictive Regret Matching [1]

> I was thinking about developing a 5-max poker

Oh, sounds like lot of fun!

> I don't see why a LLM can't learn to play a mixed strategy. A LLM outputs a distribution over all tokens, which is then randomly sampled from.

I tend to agree, I wrote more in another comment. It's just not something an off-the-shelf LLM would do reliably today without lots of non-trivial modifications.

[0] https://arxiv.org/abs/2106.06068

[1] https://ojs.aaai.org/index.php/AAAI/article/view/16676

[2] https://arxiv.org/abs/2206.15378