> these explanations are post-hoc. The best available evidence suggests this is ...

wizzwizz4 · on Jan 25, 2025

The explanations I give of my behaviour are post-hoc (unless I was paying attention), but I also assess their plausibility by going "if this were the case, how would I behave?" and seeing how well that prediction lines up with my actual behaviour. Over time, I get good at providing explanations that I have no reason to believe are false – which also tend to be explanations that allow other people to predict my behaviour (in ways I didn't anticipate).

GPT-based predictive text systems are incapable of introspection of any kind: they cannot execute the algorithm I execute when I'm giving explanations for my behaviour, nor can they execute any algorithm that might actually result in the explanations becoming or approaching truthfulness.

The GPT model is describing a fictional character named ChatGPT, and telling you why ChatGPT thinks a certain thing. ChatGPT-the-character is not the GPT model. The GPT model has no conception of itself, and cannot ever possibly develop a conception of itself (except through philosophical inquiry, which the system is incapable of for different reasons).

chongli · on Jan 25, 2025

Of course! If you’ve played Codenames and introspected on how you play you can see this in action. You pick a few words that feel similar and then try to justify them. Post-hoc rationalization in action.

topaz0 · on Jan 25, 2025

Except you also examine the rationalization as part of deciding whether to act on the impulse or not.

chongli · on Jan 25, 2025

Yes and you may search for other words that fit the rationalization to decide whether or not it's a good one. You can go even further if your teammates are people you know fairly well by bringing in your own knowledge of these people and how they might interpret the clues. There's a lot of strategy in Codenames and knowledge of vocabulary and related words is only part of it.