I have beef with Lakera AI specifically -- Lakera AI has never produced a public demo that has a 100% defense rate against prompt injection. Lakera has launched a "game" that it uses for harvesting data to train its own models, but that game has never been effective at preventing 100% of attacks and does not span the full gamut of every possible attack.
If Lakera AI had a defense for this, the company would be able to prove it. If you had a working 100% effective method for blocking injections, there would be an impossible level in the game. But you don't have one, so the game doesn't have a level like that.
Lakera AI is engaging in probabilistic defense, but in the company's marketing it attempts to make it sound like there's something more reliable going on. No one has ever demonstrated a detector that is fully reliable, and no one has a surefire method for defending against all prompt injections, and very genuinely I consider it to be deceptive that Lakera AI regularly leaves that fact out of its marketing.
The post above is wrong -- there is no 100% reliable way to catch this particular attack with an injection detector. What you should say is that at Lakera AI you have an injection detector that catches this attack some of the time. But that's not how Lakera phrases its marketing. The company is trying to discretely sell people on the idea of a product that does not exist and has not been demonstrated by researchers to be even possible to build.
Sorry, where is Lakera claiming to have 100% success rate to an ever changing attack?
Of course that’s a known fact among technical people expert in that matter that an impassable defense against any kind of attack of this nature is impossible.
> Sorry, where is Lakera claiming to have 100% success rate to an ever changing attack?
In any other context other than prompt injection, nearly everyone would interpret the following sentence as meaning Lakera's product will always catch this attack:
> We at Lakera AI work on a prompt injection detector that actually catches this particular attack.
If we were talking about SQL injections, and someone posted that prepared statements catch SQL injections, we would not expect them to be referring to a probabilistic solution. You could argue that the context is the giveaway, but honestly I disagree. I think this statement is very far off the mark:
> Of course that’s a known fact among technical people expert in that matter that an impassable defense against any kind of attack of this nature is impossible.
I don't think I've ever seen a thread on HN about prompt injection that hasn't had people arguing that it's either easy to solve or can be solved through chained outputs/inputs, or that it's not a serious vulnerability. There are people building things with LLMs today who don't know anything about this. There are people launching companies off of LLMs who don't know anything about prompt injection. The experts know, but very few of the people in this space are experts. Ask Simon how many product founders he's had to talk to on Twitter after they've written breathless threads where they discover for the first time that system prompts can be leaked by current models.
So the non-experts that are launching products discover prompt injection, and then Lakera swoops in and says they have a solution. Sure, they don't outright say that the solution is 100% effective. But they also don't make a strong point to say that it's not; and people's instincts about how security works fill in the gaps in their head.
People don't have the context or the experience to know that Lakera's "solution" is actually a probabilistic model and that it should not be used for serious security purposes. In fact, Lakera's product would be insufficient for Google to use in this exact situation. It's not appropriate for Lakera to recommend its own product for a use-case that its product shouldn't be used for. And I do read their comment as suggesting that Lakera AI's product is applicable to this specific Bard attack.
Should we be comfortable with a company coming into a thread about a security vulnerability and pitching a product that is not intended to be used for that class of security vulnerability? I think the responsible thing for them to do is at least point out that their product is intended to address a different kind of problem entirely.
A probabilistic external classifier is not sufficient to defend against data exfiltration and should not be advertised as a tool to guard against data exfiltration. It should only be advertised to defend against attacks where a 100% defense is not a requirement -- tasks like moderation, anti-spam, abuse detection, etc... But I don't think that most readers know that about injection classifiers, and I don't think Lakera AI is particularly eager to get people to understand that. For a company that has gone to great lengths to teach people about the potential dangers of prompt injection in general, that educational effort stops when it gets to the most important fact about prompt injection: that we do not (as of now) know how to securely and reliably defend against it.
On your first point, I must disagree. The word “prevent” would be used to indicate 100%, well, prevention. You “catch” something you’re hunting for and hunts aren’t always successful. A spam filter “catches” spam, nobody expects it to catch 100% of spam.
If Lakera AI had a defense for this, the company would be able to prove it. If you had a working 100% effective method for blocking injections, there would be an impossible level in the game. But you don't have one, so the game doesn't have a level like that.
Lakera AI is engaging in probabilistic defense, but in the company's marketing it attempts to make it sound like there's something more reliable going on. No one has ever demonstrated a detector that is fully reliable, and no one has a surefire method for defending against all prompt injections, and very genuinely I consider it to be deceptive that Lakera AI regularly leaves that fact out of its marketing.
The post above is wrong -- there is no 100% reliable way to catch this particular attack with an injection detector. What you should say is that at Lakera AI you have an injection detector that catches this attack some of the time. But that's not how Lakera phrases its marketing. The company is trying to discretely sell people on the idea of a product that does not exist and has not been demonstrated by researchers to be even possible to build.