That’s a common fallacy. I suggest you make a plot of failure rate vs amount of ...

ankit219 · 2025-07-21T06:12:52 1753078372

I talk about it from experience. How else do you think people are training RL agents if not based on verifiers? You don't have to verify every output at every step, you just need enough to course correct the agent and catch early when it's going wrong. That is the exact fallacy I was trying to address. The optimization comes from realizing the critical checks and then what passes to the next step. Requires letting go of the previous thinking and changing paradigms.

The failure rate is high because you view it in series. At test time you need to know what is correct from the options (including nothing correct), you dont need to know why it failed. You can debug later. The challenge is how easily can you return to the right track.