Im working on a game RL framework for turn-based games, my aim was to learn abou...

Im working on a game RL framework for turn-based games, my aim was to learn about self-play and RL.

I found if you structure the human controller in the exact same way as the AI controllers, you can swap them easily. So I have Agent (abstract), HumanAgent (takes keyboard input), and DqnAgent (DQN learning agent) as the different controllers, the rest of the code is agnostic to the controller. With this setup you can also do things like record your own gameplay.

If your goal is running the track in minimal time, you could reward it at the end (reward = -1* elapsed_time), or as you go (reward=current_speed), or once each lap, etc. These sound similar but may have different training properties. So maybe plan to explore your reward shaping space a bit.