https://en.wikipedia.org/wiki/Goodhart%27s_law "When a measure becomes a target,...

https://en.wikipedia.org/wiki/Goodhart%27s_law "When a measure becomes a target, it ceases to be a good measure"

I'm also curious what results we would get if SWE came up with a new set of 500 problems to run all these models against, to guard against overfitting.