Maybe it's substantially easier to trick a LLM but humans are easy to trick too if they have no training. Take phone scam for example. It is just incredible horrible to listen to scam recordings and notice how easily a victim starts to follow instructions without reasoning about anymore.
> but humans are easy to trick too if they have no training
You're not wrong, but I want to re-stress:
- LLMs are somehow, incredibly, even easier to trick than that.
- This is one of the reasons why you wouldn't want to have an untrained human working in a position that can be attacked this way.
It's not necessarily a full category difference, but people are starting to say, "humans are trickable anyway, so why not use an LLM instead?" Because using an LLM is like hiring the people who fall for those phone scams and then setting them up as bank tellers with no training. It's a step backwards on security.
It's been a really difficult task for the security community to start convincing companies that they actually have to train their employees around stuff like phishing attacks, and that they need to set up access controls around trained employees so that there are requests to do certain actions, so that phishing attacks against one random employee doesn't break the entire business... so imagine a world where you can't train the employee to be less vulnerable to phishing attacks, the vulnerability is (as far as we can tell) baked into the model itself. And imagine a world where every one of your employees is as maximally vulnerable to phishing attacks as it would be possible for a human being to be.
Even with training, human suggestibility/fallibility is probably the biggest security risk for most orgs out there. And we're proposing that they adopt a technology that makes that worse?