How do you *know* this? Obviously, humans failing in these ways ARE in the train...

ACCount37 · 2025-11-15T09:59:08 1763200748

First: generalization. The failure modes extend to unseen tasks. That specific way to fail at "1kg of steel" sure was in the training data, but novel closed set logic puzzles couldn't have been. They display similar failures. The same "vibe-based reasoning" process of "steel has heavy vibes, feather has light vibes, thus, steel is heavier" produces other similar failures.

Second: the failures go away with capability (raw scale, reasoning training, test-time compute), on seen and unseen tasks both. Which is a strong hint that the model was truly failing, rather than being capable of doing a task but choosing to faithfully imitate a human failure instead.

I don't think the influence of human failures in the training data on the LLMs is nil, but it's not just a surface-level failure repetition behavior.