Great article, and some great links from it too. The list of models "following the letter but not the spirit" at https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOa... has some hilarious and creative examples that one would hope never make it into production:
> AI trained to classify skin lesions as potentially cancerous learns that lesions photographed next to a ruler are more likely to be malignant.
> Agent pauses the game indefinitely to avoid losing
> A robotic arm trained to slide a block to a target position on a table achieves the goal by moving the table itself.
> Evolved player makes invalid moves far away in the board, causing opponent players to run out of memory and crash
> Genetic algorithm for image classification evolves timing attack to infer image labels based on hard drive storage location
> Deep learning model to detect pneumonia in chest x-rays works out which x-ray machine was used to take the picture; that, in turn, is predictive of whether the image contains signs of pneumonia, because certain x-ray machines (and hospital sites) are used for sicker patients.
> Creatures bred for speed grow really tall and generate high velocities by falling over
> Neural nets evolved to classify edible and poisonous mushrooms took advantage of the data being presented in alternating order, and didn't actually learn any features of the input images
> Evolved player makes invalid moves far away in the board, causing opponent players to run out of memory and crash
Well, this sounds like speedrunning. People have found arbitrary code execution vulnerabilities in SNES and used them to jump to the credits (which counts as completed the game) in less than a minute: https://www.youtube.com/watch?v=Jf9i7MjViCE
In this case it’s just choosing an option that involves very large numbers because it’s learned that it’s opponent can’t handle large numbers. There’s no code injection
The SMB3 ACE is one of the most technically interesting glitches. The usual skips and saves are much more mundane.
My point here is that there is similarity between (some) human players and some AI players. Even the discussion whether exploiting a glitch is actually 'winning' also looks very similar.
Seeing this work by Schmidhuber suddenly reminded of this song: https://twitter.com/i/status/1155091710281580548
a sort of tongue-in-cheek celebration his famous eagerness when discussing his work.
Aren't a lot of these a kind of over fitting the data?
The learning agent discovers something that is true of the training set, but does not generalize to other examples of the problem outside of the training set.
The article talks about much of the problem being in data set construction, because it is very tricky to design data sets without accidental biases that the learner can use to correctly categorize the examples that have nothing to do with the actual problem you want the learner to solve. The traditional techniques to avoid over fitting, like holding out part of the training data, don't do any good if the entire data set is not representative of the real world in some systematic way.
> AI trained to classify skin lesions as potentially cancerous learns that lesions photographed next to a ruler are more likely to be malignant.
> Agent pauses the game indefinitely to avoid losing
> A robotic arm trained to slide a block to a target position on a table achieves the goal by moving the table itself.
> Evolved player makes invalid moves far away in the board, causing opponent players to run out of memory and crash
> Genetic algorithm for image classification evolves timing attack to infer image labels based on hard drive storage location
> Deep learning model to detect pneumonia in chest x-rays works out which x-ray machine was used to take the picture; that, in turn, is predictive of whether the image contains signs of pneumonia, because certain x-ray machines (and hospital sites) are used for sicker patients.
> Creatures bred for speed grow really tall and generate high velocities by falling over
> Neural nets evolved to classify edible and poisonous mushrooms took advantage of the data being presented in alternating order, and didn't actually learn any features of the input images