A little idea I got from playing with AI SWE Agents. Can AI help make sure we understand the code that our AIs write?
PR Quiz uses AI to generate a quiz from a pull request and blocks you from merging until the quiz is passed. You can configure various options like the LLM model to use, max number of attempts to pass the quiz or min diff size to generate a quiz for. I found that the reasoning models, while more expensive, generated better questions from my limited testing.
Privacy: This GitHub Action runs a local webserver and uses ngrok to serve the quiz through a temporary url. Your code is only sent to the model provider (OpenAI).
This is a good question, but also how do we make sure that humans understand the code that _other humans_ have (supposedly) written? Effective code review is hard as it implies that the reviewer already has their own mental model about how a task could/would/should have been done, or is at the very least building their own mental model at reading-time and internally asking 'Does this make sense?'.
Without that basis code review is more like a fuzzy standards compliance, which can still be useful, but it's not the same as review process that works by comparing alternate or co-operatively competing models, and so I wonder how much of that is gained through a quiz-style interaction.