The point is this > AlphaZero used the set of legal actions obtained from the si...

chongli · on Dec 23, 2020

Oh okay. So it's a technique that essentially allows the tree search to be less "pedantic" about the rules in future states. Very interesting.

I would love to see how this might go for more complicated games such as NES adventure games and RPGs.

orlp · on Dec 23, 2020

> while a weak enough player may consider illegal moves while planning in your head

This isn't just weak players. E.g. strong chess players often consider moves as if blocking pawns weren't there. They might consider a bishop to be on a strong diagonal despite there being a blocking pawn because they can imagine moves that would happen if that pawn would disappear.

thomasahle · on Dec 24, 2020

I suppose you are right. But MuZero won't be able to do this, since it's training forces it to consider legal moves in its planning.

orlp · on Dec 24, 2020

No it doesn't. MuZero does its planning entirely in its own latent space (it may not even actually think of the game in terms of 'moves' but in whatever steps it considers relevant instead), only the output is filtered for legal moves.

It's no different than a monkey operating a chess computer that makes sure the monkey only performs legal moves. Your suggestion would be akin to suggesting that the chess computer would be affecting the monkey's mind so that it can only think in terms of legal chess moves.

klipt · on Dec 24, 2020

Seems you could equivalently treat rule breaking as a loss, and any algorithm sophisticated enough to learn how to win will also learn to avoid breaking the rules.