The seahorse emoji is one of the canonical "Mandela effects". These are things t...

The seahorse emoji is one of the canonical "Mandela effects". These are things that a large group of people collectively (mis)remember, but turn out to have never existed. Classic examples include the cornucopia in the Fruit of the Loom label (never there), and the wording on car mirrors "objects in the mirror may be closer than they appear." (There's no record of 'may be closer', just 'are closer').

Unfortunately, the discussion around Mandela effects gets tainted by lots of people being so sure of their memory that the only explanation must be fantastical (the timeline has shifted!), giving the topic a valence of crazy that discourages engagement. I find these mass mis-rememberings fascinating from a psychological perspective, and lacking satisfying explanation (there probably isn't one).

So here we're seeing LLMs "experiencing" the same mandela effect that afflicts so many people, and I sincerely wonder why? The obvious answer is that the training data has lots of discussions about this particular mandela effect, ie people posting online "where is the seahorse emoji"? But those discussions are probably necessarily coupled with language that ascertains 'no, the seahorse emoji does not exist.' That's why the discussion is there in the first place! so why does the model take on the persona of someone that is sure it does exist? Why does it steer the models into such a weird feedback loop?