The most interesting part of this paper to me is that they let the agent guess how efficient it’s own solutions were and only had the model experimentally verify it’s guesses in 0.002% of cases. This allowed the model to search much faster than another program that didn’t guess and had to run every program.
This is the same sort of "more guesses faster beats smarter guesses slower" that made afl-fuzz by far the best at exploring large search spaces in program fuzzing
Fast search often beats accurate search. Sometimes adding clever heuristics or more complex scoring "works" but slows down the search enough that it's an overall loss. Another kind of a bitter lesson, perhaps
But why isn't the proposed method an instance of smart guessing? It reduces oracle complexity with heuristics. The heuristic is "build a machine learning model of the objective function and use it to fake oracle queries most of the time."
This is actually quite common to optimize stuff in several disciplines. You essentially fit a surrogate model (keyword if you want to look up more) to whatever you want to optimize, then use the model to guide the procedure, making sure that the model is correct every one in a while.
I've been wondering about a similar approach for biomolecular simulations, where exact computations are still a hard bottleneck. I wonder if something like this could give us a few orders of magnitude more speed.
There's likely a connection. Either way, I like to describe AIs like ChatGPT / diffusion models, etc. as operating 100% on intuition. It gives people a better intuition of their weaknesses...
For GPT you can kind of prompt it to do chain-of-thought reasoning, but it doesn't work very well; not if you compare it to what humans do.
Once again it seems like what we thought was hard, is easy; what we thought was easy and computer-like turns out to be hard.