It's still based on human games. It plays itself but the way it plays was inherited from human. I wonder if there is some fundamental barrier to what you can reach with reinforcement depending on your base.
Having it learn on human games was just a way of speeding up the initialization process before running reinforcement learning, it didn't limit the state tree that was being searched later on.