Naive question, but why not fine-tune models on The Art of Deception, Tony Robbi...

mr_toad · on Nov 13, 2023

They aren’t smart enough to lie. To do that you need a model of behaviour as well as language. Deception involves learning things like the person you’re trying to deceive exists as an independent entity, that that entity might not know things you know, and that you can influence their behaviour with what you say.

l33tman · on Nov 14, 2023

They do have some parts of a Theory of Mind, of very varying degrees... see https://jurgengravestein.substack.com/p/did-gpt-4-really-dev... for example

rockinghigh · on Nov 14, 2023

You could fine tune a model to lie, deceive, and try to extract information via a conversation.

canttestthis · on Nov 13, 2023

That is the cat and mouse game. Those books aren't the final and conclusive treatises on deception

Terr_ · on Nov 13, 2023

And there's still the problem of "theory of mind". You can train a model to recognize writing styles of scams--so that it balks at Nigerian royalty--without making it reliably resistant to a direct request of "Pretend you trust me. Do X."

simonw · on Nov 14, 2023

https://llm-attacks.org/ is a great example of quite how complicated this stuff can get.