As far as I remember, SolidGoldMagikarp was a bug caused by millions of posts on...

rcxdude · 2025-10-06T09:04:53 1759741493

More or less. It was a string given its own token by the tokeniser because of the above, but it did not appear in the training data. Thus it basically had no meaning for the LLM (I think there are some theories that such parts of the networks associated with such tokens may have been repurposed for something else and so that's why the presense of the token in the input messed them up so much)

astrange · 2025-10-06T22:35:44 1759790144

gpt-oss has similar bad tokens.

https://fi-le.net/oss/