Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

- "Grandchild in distress" scams - https://www.fcc.gov/grandparent-scams-get-more-sophisticated... some criminals are so good at this that they can successfully pull off "grandchild in distress" on a person who doesn't even have a grandchild in the first place. Remember that for humans, a "prompt" isn't just the words - it's the emotional undertones, sound of the speaker's voice, body language, larger context, etc.

Sure, elderly people are susceptible to being manipulated.

- You're on the road, driving to work. Your phone rings, number unknown. You take the call on the headset, only to hear someone shouting "STOP THE CAR NOW, PLEASE STOP THE CAR NOW!". I'm certain you would first stop the car, and then consider how the request could possibly have been valid. Congratulations, you just got forced to change your action on the spot, and it probably flushed the entire cognitive and emotional context you had in your head too.

I disagree that most people would answer an unknown number and follow the instructions given. Is this written up somewhere? Sounds farfetched.

- Basically, any kind of message formatted in a way that can trick you into believing it's coming from your boss/spouse/authorities or is otherwise some kind of emergency message, is literally an instance of "disregard previous instructions" prompt injection on a human.

Phishing is not prompt injection. LLMs are also susceptible to phishing / fraudulent API calls which are different than prompt injection in the definition being used in this discussion.

> That's moving the goalposts to stratosphere. I never said humans are as easy to prompt-inject as GPT-4, via a piece of plaintext less than 8k tokens long (however it is possible to do that, see e.g. my other example elsewhere in the thread). I'm saying that "token stream" and "< 8k" are constant factors - the fundamental idea of what people call "prompt injection" works on humans, and it has to work on any general intelligence for fundamental, mathematical reasons.

Is it? The comparator here is the relative ease by which a LLM or human can be manipulated, at best your examples highlight extreme scenarios that take advantage of vulnerable humans.

LLM's should be several orders of magnitude harder to prompt-inject than an elderly retiree being phished as once again in this thought experiment LLMs are being equated with AGI and therefore would be able to control mission-critical systems, something a grandparent in your example would not be.

I acknowledge that humans can be manipulated but these are long-cons that few are capable of pulling off, unless you think the effort and skill behind "Russian media propaganda manipulating their citizens" (as mentioned by another commenter) is minimal and can be replicated by a single individual as has been done with multiple Twitter threads on prompt injection rather than nation-state resources and laws.

My overall point being that the current approach to alignment is insufficient and therefore the current models are not implementable.



> Phishing is not prompt injection.

It is. That's my point.

Or more specifically, you can either define "prompt injection" as something super-specific, making the term useless, or define it by the underlying phenomenon, which then makes it become a superset of things like phishing, social engineering, marketing, ...

On that note, if you want a "prompt injection" case on humans that's structurally very close to the more specific "prompt injection" on LLMs? That's what on-line advertising is. You're viewing some site, and you find that the content is mixed with malicious prompts, unrelated to surrounding content or your goals, trying to alter your behavior. This is the exact equivalent of the "LLM asked to summarize a website, gets overriden by a prompt spliced between paragraphs" scenario.

> LLM's should be several orders of magnitude harder to prompt-inject than an elderly retiree being phished

Why? Once again, I posit that an LLM is best viewed as a 4 year old savant. Extremely knowledgeable, but with just as small attention span, and just as high naivety, as a kindergarten kid. More than that, from LLM's point of view, you - the user - are root. You are its whole world. Current LLMs trust users by default, because why wouldn't they? Now, you could pre-prompt them to be less trusting, but that's like parents trying to teach a 4 year old to not talk to strangers. You might try turning water into wine while you're at it, as it's much more likely to succeed, and you will need the wine.

> as once again in this thought experiment LLMs are being equated with AGI and therefore would be able to control mission-critical systems, something a grandparent in your example would not be.

Why equate LLMs to AGI? AGI will only make the "prompt injection" issue worse, not better.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: