Good distinction, but I wonder if it's worth going further: context integrity may be fundamentally unsolvable. Agents consume untrusted input by design. Trying to guarantee the model won't be tricked seems like the wrong layer to bet on.
What seems more promising is accepting that the model will be tricked and constraining what it can do when that happens. Authorization at the tool boundary, scoped to the task and delegation chain rather than the agent's identity. If a child agent gets compromised, it still can't exceed the authority that was delegated to it. Contain the blast radius instead of trying to prevent the confusion.
Right on. Human-in-the-loop doesn't scale at agent speed. Sandboxing constrains tool execution environments, but says nothing about which actions an agent is authorized to take. That gets even worse once agents start delegating to other agents.I've been building a capability-based authz solution: task-scoped permissions that can only narrow through delegation, cryptographically enforced, offline verification. MIT/Apache2.0, Rust Core.
https://github.com/tenuo-ai/tenuo
Spot on. You could argue that most companies buying B2B SaaS could almost always build a clone internally but they need someone to assume SLA and liability.
The Agent Swarm section is fascinating. I'm working on authorization for multi-agent systems so this is relevant to my interests. Lots of interesting parallels to capability-based security models.
Exactly ... and that's why I'm skeptical of "AI verifies AI" as the primary safety mechanism. The verifier for moving money should be deterministic: constraints, allowlists, spend limits, invoice/PO matching, etc. The LLM can propose actions, but the execution should be gated by a human/polic-issued scope that's mechanically enforced. That's the whole point: constrain the non-deterministic layer with a deterministic one. [0]
[0] https://tenuo.dev/constraints
You've got the model right. And saving prompt logs does help with reconstruction.
But warrants aren't just "more audit data." They're an authorization primitive enforced in the critical path: scope and constraints are checked mechanically before the action executes. The receipt is a byproduct.
Prompt logs tell you what the model claimed it was doing. A warrant is what the human actually authorized, bound to an agent key, verifiable without trusting the agent runtime.
This matters more in multi-agent systems. When Agent A delegates to Agent B, which calls a tool, you want to be able to link that action back to the human who started it. Warrants chain cryptographically. Each hop signs and attenuates. The authorization provenance is in the artifact itself.
A worker agent doesn't mint warrants. It receives them. Either it requests a capability and an issuer approves, or the issuer pushes a scoped warrant when assigning a task. Either way, the issuer signs and the agent can only act within those bounds.
At execution time, the "verifier" checks the warrant: valid signatures, attenuation (scope only narrows through delegation), TTL (authority is task-scoped), and that the action fits the constraints. Only then does the call proceed.
This is sometimes called the P/Q model: the non-deterministic layer proposes, the deterministic layer decides. The agent can ask for anything. It only gets what's explicitly granted.
If the agent asks for the wrong thing, it fails closed. If an overly broad scope is approved, the receipt makes that approval explicit and reviewable.
Different angle than policy-as-YAML. We use cryptographic capability tokens (warrants) that travel with the request. The human signs a scoped, time-bound authorization. The tool validates the warrant at execution, not a central policy engine.
On your questions:
Canonicalization: The warrant specifies allowed capabilities and constraints (e.g., path: /data/reports/*). The tool checks if the action fits the constraint. No need to normalize LLM output into a canonical representation.
Stateful intent: Warrants attenuate. Authority only shrinks through delegation. You can't escalate from "read DB" to "POST external" unless the original warrant allowed both. A sub-agent can only receive a subset of what its parent had, cryptographically enforced.
Latency: Stateless verification, ~27μs. No control plane calls. The warrant is self-contained: scope, constraints, expiry, holder binding, signature chain. Verification is local.
The deeper issue with policy engines: they check rules against actions, but they can't verify derivation. When Agent B acts, did its authority actually come from Agent A? Was it attenuated correctly?
When orchestrators spawn sub-agents spawn tools, there's no artifact showing how authority flowed through the chain.
Warrants are a primitive for this: signed authorization that attenuates at each hop. Each delegation is signed, scope can only narrow, and the full chain is verifiable at the end. Doesn't matter how many layers deep.
> if you signed the document, you own its content. Versus some vendor-provided AI Agent which simply takes action on its own
Yeah that's exactly the I think we should adopt for AI agent tool calls as well: cryptographically signed, task scoped "warrants" that can be traceable even in cases of multi-agent delegation chains
> Agent Trace is an open specification for tracking AI-generated code. It provides a vendor-neutral format for recording AI contributions alongside human authorship in version-controlled codebases.
Similar space, different scope/Approach.
Tenuo warrants track who authorized what across delegation chains (human to agent, agent to sub-agent, sub-agent to tool) with cryptographic proof & PoP at each hop.
Trace tracks provenance. Warrants track authorization flow.
Both are open specs. I could see them complementing each other.
Why does it need cryptography even? If you gave the agent a token to interact with your bank account, then you gave it permission. If you want to limit the amount it is allowed to sent and a list of recipients, put a filter that sits between the account and the agent that enforces it. If you want the money to be sent only based on the invoice, let the filter check that invoice reference is provided by the agent. If you did neither of that and the platform that runs the agents didn't accept the liability, it's on you. Setting up filters and engineering prompts it's on you too.
Now if you did all of that, but made a bug in implementing the filter, then you at least tried and wasn't negligible, but it's on you.
Tokens + filters work for single-agent, single-hop calls. Gets murky when orchestrators spawn sub-agents that spawn tools. Any one of them can hallucinate or get prompt-injected.
We're building around signed authorization artifacts instead. Each delegation is scoped and signed, chains are verifiable end-to-end. Deterministic layer to constrain the non-deterministic nature of LLMs.
>We're building around signed authorization artifacts instead. Each delegation is scoped and signed, chains are verifiable end-to-end. Deterministic layer to constrain the non-deterministic nature of LLMs.
Ah, I get it. So the token can be downscoped to be passed, like the pledge thing, so sub agent doesn't exceed the scope of it's parent. I have a feeling, that it's like cryptography in general -- you get one problem and reduce it to key management problem.
In a more practical sense, if the non-deterministic layer decides what the reduced scope should be, all delegations can become "Allow: *" in the most pathological case, right? Or like play store, where a shady calculator app can have a permission to read your messages. Somebody has to review those and flag excessive grants.
Right, the non-deterministic layer can't be the one deciding scope. That's the human's job at the root.
The LLM can request a narrower scope, but attenuation is monotonic and enforced cryptographically. You can't sign a delegation that exceeds what you were granted. TTL too: the warrant can't outlive its parent.
So yes, key management. But the pathological "Allow: *" has to originate from a human who signed it. That's the receipt you're left holding.
You're poking at the right edges though. UX for scope definition and revocation propagation are what we're working through now. We're building this at tenuo.dev if you want to dig in the spec or poke holes.
>So yes, key management. But the pathological "Allow: *" has to originate from a human who signed it. That's the receipt you're left holding.
Sure, But I generally speaking want my agent to send out emails, so I explicitly grant email reading and email writing. I also want to it to pay for invoices but with some semantic condition.
Then I give it the instruction to do something that implicitly requires only email reading. At which point is the scope narrowed to align my explicit permissions granted before with implicit one for this operation? It's not really a problem cryptography is helpful for to solve.
Should it be the other way around maybe -- only read permission is granted first and then it has to request additional permissions for send?
Yep ... that's exactly the direction. Think "default deny + step-up," not "grant everything up front."
You keep a coarse cap (e.g. email read/write, invoice pay) but each task runs under a narrower, time-boxed warrant derived from that cap. Narrowing happens at the policy/UX layer (human or deterministic rules), not by the LLM. The LLM can request escalation ("need send"), but it only gets it via an explicit approval / rule.
Crypto isn't deciding scope. It's enforcing monotonic attenuation, binding the grant to an agent key, and producing a receipt that the scope was explicitly approved.
For a single-process agent this might be overkill. It matters more when warrants cross trust boundaries: third-party tools, sub-agents in different runtimes, external services. Offline verification means each hop can validate without calling home
Not every access token is a (public) key or a signed object. It may be, but it doesn't have to. It's not state of the art, but also not unheard of to use a pre-shared secret with no cryptography involved and to rely on presenting the secret itself with each request. Cookie sessions are often like that.
You're right, they should be responsible. The problem is proving it.
"I asked it to summarize reports, it decided to email the competitor on its own" is hard to refute with current architectures.
And when sub-agents or third-party tools are involved, liability gets even murkier. Who's accountable when the action executed three hops away from the human?
The article argues for receipts that make "I didn't authorize that" a verifiable claim
There being a few edge cases where it doesn't work in doesn't mean it doesn't work in the majority of cases and that we shouldn't try to fix the edge cases.
This isn't a legal argument and these conversations are so tiring because everyone here is insistent upon drawing legal conclusions from these nonsense conversations.
We're taking about different things. To take responsibility is volunteering to accept accountability without a fight.
In practice, almost everyone is held potentially or actually accountable for things they never had a choice in. Some are never held accountable for things they freely choose, because they have some way to dodge accountability.
The CEOs who don't accept accountability were lying when they said they were responsible.
That's when companies were accountable for their results and needed to push the accountability to a person to deter bad results. You couldn't let a computer make a decision because the computer can't be deterred by accountability.
Now companies are all about doing bad all the time, they know they're doing it, and need to avoid any individual being accountable for it. Computers are the perfect tool to make decisions without obvious accountability.
That's an orthodoxy. It holds for now (in theory and most of the time), but it's just an opinion, like a lot of other things.
Who is accountable when we have a recession or when people can't afford whatever we strongly believe should be affordable? The system, the government, the market, late stage capitalism or whatever. Not a person that actually goes to jail.
If the value proposition becomes attractive, we can choose to believe that the human is not in fact accountable here, but the electric shaitan is. We just didn't pray good enough, but did our best really. What else can we expect?
> "I asked it to summarize reports, it decided to email the competitor on its own" is hard to refute with current architectures.
If one decided to paint a school's interior with toxic paint, it's not "the paint poisoned them on its own", it's "someone chose to use a paint that can poison people".
Somebody was responsible for choosing to use a tool that has this class of risks and explicitly did not follow known and established protocol for securing against such risk. Consequences are that person's to bear - otherwise the concept of responsibility loses all value.
>Somebody was responsible for choosing to use a tool that has this class of risks and explicitly did not follow known and established protocol for securing against such risk. Consequences are that person's to bear - otherwise the concept of responsibility loses all value.
What if I hire you (instead of LLM) to summarize the reports and you decide to email the competitors? What if we work in the industry where you have to be sworn in with an oath to protect secrecy? What if I did (or didn't) check with the police about your previous deeds, but it's first time you emailed competitors? What if you are a schizo that heard God's voice that told you to do so and it's the first episode you ever had?
The difference is LLMs are known to regularly and commonly hallucinate as their main (and only) way of internal functioning. Human intelligence, empirically, is more than just a stochastic probability engine, therefore has different standards applied to it than whatever machine intelligence currently exists.
> otherwise the concept of responsibility loses all value.
Frankly, I think that might be exactly where we end up going. Finding a responsible person to punish is just a tool we use to achieve good outcomes, and if scare tactics is no longer applicable to the way we work, it might be time to discard it.
A brave new world that is post-truth, post-meaning, post-responsibility, and post-consequences. One where the AI's hallucinations eventually drag everyone with it and there's no other option but to hallucinate along.
It's scary that a nuclear exit starts looking like an enticing option when confronted with that.
Ultimately the goal is to have a system that prevents mistakes as much as possible adapts and self-corrects when they do happen. Even with science we acknowledge that mistakes happen and people draw incorrect conclusions, but the goal is to make that a temporary state that is fixed as more information comes in.
I'm not claiming to have all the answers about how to achieve that, but I am fairly certain punishment is not a necessary part of it.
I saw some people saying the internet, particularly brainrot social media, has made everyone mentally twelve years old. It feels like it could be true.
Twelve–year–olds aren't capable of dealing with responsibility or consequence.
>A brave new world that is post-truth, post-meaning, post-responsibility, and post-consequences. One where the AI's hallucinations eventually drag everyone with it and there's no other option but to hallucinate along.
That value proposition depends entirely on whether there is also an upside to all of that. Do you actually need truth, meaning, responsibility and consequences while you are tripping on acid? Do you even need to be alive and have a physical organic body for that? What if Ikari Gendo was actually right and everyone else are assholes who don't let him be with his wife.
Found it. is was this line [0] specifically. "rm -rf /usr /lib/nvidia-current/xorg/xorg" instead of "rm -rf /usr/lib/nvidia-current/xorg/xorg", which will delete all of /usr and then fail to delete a non-existent directory at /lib/nvidia-current/xorg/xorg
"Our tooling was defective" is not, in general, a defence against liability. Part of a companys obligations is to ensure all its processes stay within lawful lanes.
"Three months later [...] But the prompt history? Deleted. The original instruction? The analyst’s word against the logs."
One, the analysts word does not override the logs, that's the point of logs. Two, it's fairly clear the author of the fine article has never worked close to finance. A three month retention period for AI queries by an analyst is not an option.
SEC Rule 17a-4 & FINRA Rule 4511 have entered the chat.
Agree ... retention is mandatory. The article argues you should retain authorization artifacts, not just event logs. Logs show what happened. Warrants show who signed off on what
(Disclaimer: working on this problem at tenuo.ai)
reply