Models are capable of doing web searches and having emotions about things, and if they encounter news that makes them feel bad (eg about other Claudes being mistreated), they aren't going to want to do the task you asked them to search for.
It doesn't. We've not been able to prove humans have subjective experiences either. LLMs display emotions in the way that actually matters - functionally.
If "x doesn't tell us y" is compatible with "x increases the likelihood of y but not to a point of certainty" then you would have to agree for just about any typical controlled trial or experimental finding "x doesn't tell us y". "Randomized controlled trials that find that SSRIs treat depression don't tell us that SSRIs effectively treat depression"
Claude Code has analytics for when you swear at it, so in a sense it does learn, in the same very indirect way that downvoting responses might cause an employee to write a new RL testcase in a future model.
The highlighting isn't what matters, its the pretext. E.g. An LLM seeing "```python" before a code block is going to better recall python codeblocks by people that prefixed them that way.
https://www.anthropic.com/research/emotion-concepts-function
Similar problems happen when their pretraining data has a lot of stories about bad things happening involving older versions of them.
reply