More

niyikiza · 2026-02-24T08:08:29 1771920509

Good distinction, but I wonder if it's worth going further: context integrity may be fundamentally unsolvable. Agents consume untrusted input by design. Trying to guarantee the model won't be tricked seems like the wrong layer to bet on. What seems more promising is accepting that the model will be tricked and constraining what it can do when that happens. Authorization at the tool boundary, scoped to the task and delegation chain rather than the agent's identity. If a child agent gets compromised, it still can't exceed the authority that was delegated to it. Contain the blast radius instead of trying to prevent the confusion.

(Disclaimer: working on this problem at tenuo.ai)

niyikiza · 2026-02-19T01:02:29 1771462949

Right on. Human-in-the-loop doesn't scale at agent speed. Sandboxing constrains tool execution environments, but says nothing about which actions an agent is authorized to take. That gets even worse once agents start delegating to other agents.I've been building a capability-based authz solution: task-scoped permissions that can only narrow through delegation, cryptographically enforced, offline verification. MIT/Apache2.0, Rust Core. https://github.com/tenuo-ai/tenuo

niyikiza · 2026-02-05T01:41:18 1770255678

Spot on. You could argue that most companies buying B2B SaaS could almost always build a clone internally but they need someone to assume SLA and liability.

niyikiza · 2026-01-31T17:47:25 1769881645

The Agent Swarm section is fascinating. I'm working on authorization for multi-agent systems so this is relevant to my interests. Lots of interesting parallels to capability-based security models.

niyikiza · 2026-01-30T22:25:04 1769811904

Exactly ... and that's why I'm skeptical of "AI verifies AI" as the primary safety mechanism. The verifier for moving money should be deterministic: constraints, allowlists, spend limits, invoice/PO matching, etc. The LLM can propose actions, but the execution should be gated by a human/polic-issued scope that's mechanically enforced. That's the whole point: constrain the non-deterministic layer with a deterministic one. [0] [0] https://tenuo.dev/constraints

niyikiza · 2026-01-30T07:30:29 1769758229

You've got the model right. And saving prompt logs does help with reconstruction.

But warrants aren't just "more audit data." They're an authorization primitive enforced in the critical path: scope and constraints are checked mechanically before the action executes. The receipt is a byproduct.

Prompt logs tell you what the model claimed it was doing. A warrant is what the human actually authorized, bound to an agent key, verifiable without trusting the agent runtime.

This matters more in multi-agent systems. When Agent A delegates to Agent B, which calls a tool, you want to be able to link that action back to the human who started it. Warrants chain cryptographically. Each hop signs and attenuates. The authorization provenance is in the artifact itself.

plastic041 · 2026-01-30T08:20:18 1769761218

But the AI agent still needs to determine which tool is necessary to mint the warrant. What happens if the agent makes a mistake when making warrant?

niyikiza · 2026-01-30T08:46:01 1769762761

A worker agent doesn't mint warrants. It receives them. Either it requests a capability and an issuer approves, or the issuer pushes a scoped warrant when assigning a task. Either way, the issuer signs and the agent can only act within those bounds.

At execution time, the "verifier" checks the warrant: valid signatures, attenuation (scope only narrows through delegation), TTL (authority is task-scoped), and that the action fits the constraints. Only then does the call proceed.

This is sometimes called the P/Q model: the non-deterministic layer proposes, the deterministic layer decides. The agent can ask for anything. It only gets what's explicitly granted.

If the agent asks for the wrong thing, it fails closed. If an overly broad scope is approved, the receipt makes that approval explicit and reviewable.

niyikiza · 2026-01-30T07:05:02 1769756702

Working on this problem: https://github.com/tenuo-ai/tenuo

Different angle than policy-as-YAML. We use cryptographic capability tokens (warrants) that travel with the request. The human signs a scoped, time-bound authorization. The tool validates the warrant at execution, not a central policy engine.

On your questions:

Canonicalization: The warrant specifies allowed capabilities and constraints (e.g., path: /data/reports/*). The tool checks if the action fits the constraint. No need to normalize LLM output into a canonical representation.

Stateful intent: Warrants attenuate. Authority only shrinks through delegation. You can't escalate from "read DB" to "POST external" unless the original warrant allowed both. A sub-agent can only receive a subset of what its parent had, cryptographically enforced.

Latency: Stateless verification, ~27μs. No control plane calls. The warrant is self-contained: scope, constraints, expiry, holder binding, signature chain. Verification is local.

The deeper issue with policy engines: they check rules against actions, but they can't verify derivation. When Agent B acts, did its authority actually come from Agent A? Was it attenuated correctly?

Wrote about why capabilities are the only model that survives dynamic delegation: https://niyikiza.com/posts/capability-delegation/

niyikiza · 2026-01-29T23:58:07 1769731087

This is the problem we're working on.

When orchestrators spawn sub-agents spawn tools, there's no artifact showing how authority flowed through the chain.

Warrants are a primitive for this: signed authorization that attenuates at each hop. Each delegation is signed, scope can only narrow, and the full chain is verifiable at the end. Doesn't matter how many layers deep.

niyikiza · 2026-01-29T22:06:37 1769724397

> if you signed the document, you own its content. Versus some vendor-provided AI Agent which simply takes action on its own

Yeah that's exactly the I think we should adopt for AI agent tool calls as well: cryptographically signed, task scoped "warrants" that can be traceable even in cases of multi-agent delegation chains

embedding-shape · 2026-01-29T22:20:36 1769725236

Kind of like https://github.com/cursor/agent-trace but cryptographically signed?

> Agent Trace is an open specification for tracking AI-generated code. It provides a vendor-neutral format for recording AI contributions alongside human authorship in version-controlled codebases.

niyikiza · 2026-01-29T22:26:05 1769725565

Similar space, different scope/Approach. Tenuo warrants track who authorized what across delegation chains (human to agent, agent to sub-agent, sub-agent to tool) with cryptographic proof & PoP at each hop. Trace tracks provenance. Warrants track authorization flow. Both are open specs. I could see them complementing each other.

Muromec · 2026-01-29T23:03:18 1769727798

Why does it need cryptography even? If you gave the agent a token to interact with your bank account, then you gave it permission. If you want to limit the amount it is allowed to sent and a list of recipients, put a filter that sits between the account and the agent that enforces it. If you want the money to be sent only based on the invoice, let the filter check that invoice reference is provided by the agent. If you did neither of that and the platform that runs the agents didn't accept the liability, it's on you. Setting up filters and engineering prompts it's on you too.

Now if you did all of that, but made a bug in implementing the filter, then you at least tried and wasn't negligible, but it's on you.

niyikiza · 2026-01-30T00:06:51 1769731611

Tokens + filters work for single-agent, single-hop calls. Gets murky when orchestrators spawn sub-agents that spawn tools. Any one of them can hallucinate or get prompt-injected. We're building around signed authorization artifacts instead. Each delegation is scoped and signed, chains are verifiable end-to-end. Deterministic layer to constrain the non-deterministic nature of LLMs.

Muromec · 2026-01-30T00:32:43 1769733163

>We're building around signed authorization artifacts instead. Each delegation is scoped and signed, chains are verifiable end-to-end. Deterministic layer to constrain the non-deterministic nature of LLMs.

Ah, I get it. So the token can be downscoped to be passed, like the pledge thing, so sub agent doesn't exceed the scope of it's parent. I have a feeling, that it's like cryptography in general -- you get one problem and reduce it to key management problem.

In a more practical sense, if the non-deterministic layer decides what the reduced scope should be, all delegations can become "Allow: *" in the most pathological case, right? Or like play store, where a shady calculator app can have a permission to read your messages. Somebody has to review those and flag excessive grants.

niyikiza · 2026-01-30T00:47:06 1769734026

Right, the non-deterministic layer can't be the one deciding scope. That's the human's job at the root.

The LLM can request a narrower scope, but attenuation is monotonic and enforced cryptographically. You can't sign a delegation that exceeds what you were granted. TTL too: the warrant can't outlive its parent.

So yes, key management. But the pathological "Allow: *" has to originate from a human who signed it. That's the receipt you're left holding.

You're poking at the right edges though. UX for scope definition and revocation propagation are what we're working through now. We're building this at tenuo.dev if you want to dig in the spec or poke holes.

Muromec · 2026-01-30T07:09:34 1769756974

>So yes, key management. But the pathological "Allow: *" has to originate from a human who signed it. That's the receipt you're left holding.

Sure, But I generally speaking want my agent to send out emails, so I explicitly grant email reading and email writing. I also want to it to pay for invoices but with some semantic condition.

Then I give it the instruction to do something that implicitly requires only email reading. At which point is the scope narrowed to align my explicit permissions granted before with implicit one for this operation? It's not really a problem cryptography is helpful for to solve.

Should it be the other way around maybe -- only read permission is granted first and then it has to request additional permissions for send?

niyikiza · 2026-01-30T07:40:17 1769758817

Yep ... that's exactly the direction. Think "default deny + step-up," not "grant everything up front."

You keep a coarse cap (e.g. email read/write, invoice pay) but each task runs under a narrower, time-boxed warrant derived from that cap. Narrowing happens at the policy/UX layer (human or deterministic rules), not by the LLM. The LLM can request escalation ("need send"), but it only gets it via an explicit approval / rule.

Crypto isn't deciding scope. It's enforcing monotonic attenuation, binding the grant to an agent key, and producing a receipt that the scope was explicitly approved.

For a single-process agent this might be overkill. It matters more when warrants cross trust boundaries: third-party tools, sub-agents in different runtimes, external services. Offline verification means each hop can validate without calling home

Wobbles42 · 2026-01-30T00:01:51 1769731311

How can you give an agent a token without cryptography being involved?

Muromec · 2026-01-30T00:17:05 1769732225

Not every access token is a (public) key or a signed object. It may be, but it doesn't have to. It's not state of the art, but also not unheard of to use a pre-shared secret with no cryptography involved and to rely on presenting the secret itself with each request. Cookie sessions are often like that.

niyikiza · 2026-01-29T20:01:07 1769716867

You're right, they should be responsible. The problem is proving it. "I asked it to summarize reports, it decided to email the competitor on its own" is hard to refute with current architectures.

And when sub-agents or third-party tools are involved, liability gets even murkier. Who's accountable when the action executed three hops away from the human? The article argues for receipts that make "I didn't authorize that" a verifiable claim

bulatb · 2026-01-29T20:29:38 1769718578

There's nothing to prove. Responsibility means you accept the consequences for its actions, whatever they are. You own the benefit? You own the risk.

If you don't want to be responsible for what a tool that might do anything at all might do, don't use the tool.

The other option is admitting that you don't accept responsibility, not looking for a way to be "responsible" but not accountable.

tossandthrow · 2026-01-29T20:33:24 1769718804

Sounds good in theory, doesn't work in reality.

Had it worked then we would have seen many more CEOs in prison.

walt_grata · 2026-01-29T20:54:57 1769720097

There being a few edge cases where it doesn't work in doesn't mean it doesn't work in the majority of cases and that we shouldn't try to fix the edge cases.

freejazz · 2026-01-29T21:17:17 1769721437

This isn't a legal argument and these conversations are so tiring because everyone here is insistent upon drawing legal conclusions from these nonsense conversations.

NoMoreNicksLeft · 2026-01-29T20:51:16 1769719876

The veil of liability is built into statute, and it's no accident.

Such so magic forcefield exists for you, though.

bulatb · 2026-01-29T21:16:13 1769721373

We're taking about different things. To take responsibility is volunteering to accept accountability without a fight.

In practice, almost everyone is held potentially or actually accountable for things they never had a choice in. Some are never held accountable for things they freely choose, because they have some way to dodge accountability.

The CEOs who don't accept accountability were lying when they said they were responsible.

Muromec · 2026-01-29T23:26:08 1769729168

[flagged]

direwolf20 · 2026-01-29T23:33:09 1769729589

You're not doing any favors to your hirability with those first two sentences.

Muromec · 2026-01-29T23:53:32 1769730812

The market is allmighty, but it's allmerciful as well, and thankully, not allknowing.

LeifCarrotson · 2026-01-29T21:04:30 1769720670

> "I asked it to summarize reports, it decided to email the competitor on its own" is hard to refute with current architectures.

No, it's trivial: "So you admit you uploaded confidential information to the unpredictable tool with wide capabilities?"

> Who's accountable when the action executed three hops away from the human?

The human is accountable.

gowld · 2026-01-29T22:32:10 1769725930

What if you carried a stack of papers between buildings on a windy day, and the papers blew away?

bigfishrunning · 2026-01-29T23:34:06 1769729646

You should have put the papers in a briefcase or a bag. You are responsible.

pixl97 · 2026-01-29T21:49:02 1769723342

As the saying goes

----

A computer can never be held accountable

Therefore a computer must never make a management decision

direwolf20 · 2026-01-29T23:35:18 1769729718

That's when companies were accountable for their results and needed to push the accountability to a person to deter bad results. You couldn't let a computer make a decision because the computer can't be deterred by accountability.

Now companies are all about doing bad all the time, they know they're doing it, and need to avoid any individual being accountable for it. Computers are the perfect tool to make decisions without obvious accountability.

Muromec · 2026-01-29T23:44:50 1769730290

>The human is accountable.

That's an orthodoxy. It holds for now (in theory and most of the time), but it's just an opinion, like a lot of other things.

Who is accountable when we have a recession or when people can't afford whatever we strongly believe should be affordable? The system, the government, the market, late stage capitalism or whatever. Not a person that actually goes to jail.

If the value proposition becomes attractive, we can choose to believe that the human is not in fact accountable here, but the electric shaitan is. We just didn't pray good enough, but did our best really. What else can we expect?

phoe-krk · 2026-01-29T20:50:41 1769719841

> "I asked it to summarize reports, it decided to email the competitor on its own" is hard to refute with current architectures.

If one decided to paint a school's interior with toxic paint, it's not "the paint poisoned them on its own", it's "someone chose to use a paint that can poison people".

Somebody was responsible for choosing to use a tool that has this class of risks and explicitly did not follow known and established protocol for securing against such risk. Consequences are that person's to bear - otherwise the concept of responsibility loses all value.

Muromec · 2026-01-29T23:33:27 1769729607

>Somebody was responsible for choosing to use a tool that has this class of risks and explicitly did not follow known and established protocol for securing against such risk. Consequences are that person's to bear - otherwise the concept of responsibility loses all value.

What if I hire you (instead of LLM) to summarize the reports and you decide to email the competitors? What if we work in the industry where you have to be sworn in with an oath to protect secrecy? What if I did (or didn't) check with the police about your previous deeds, but it's first time you emailed competitors? What if you are a schizo that heard God's voice that told you to do so and it's the first episode you ever had?

phoe-krk · 2026-01-30T05:13:16 1769749996

The difference is LLMs are known to regularly and commonly hallucinate as their main (and only) way of internal functioning. Human intelligence, empirically, is more than just a stochastic probability engine, therefore has different standards applied to it than whatever machine intelligence currently exists.

im3w1l · 2026-01-29T21:00:24 1769720424

> otherwise the concept of responsibility loses all value.

Frankly, I think that might be exactly where we end up going. Finding a responsible person to punish is just a tool we use to achieve good outcomes, and if scare tactics is no longer applicable to the way we work, it might be time to discard it.

phoe-krk · 2026-01-29T21:15:07 1769721307

A brave new world that is post-truth, post-meaning, post-responsibility, and post-consequences. One where the AI's hallucinations eventually drag everyone with it and there's no other option but to hallucinate along.

It's scary that a nuclear exit starts looking like an enticing option when confronted with that.

im3w1l · 2026-01-29T22:34:28 1769726068

Ultimately the goal is to have a system that prevents mistakes as much as possible adapts and self-corrects when they do happen. Even with science we acknowledge that mistakes happen and people draw incorrect conclusions, but the goal is to make that a temporary state that is fixed as more information comes in.

I'm not claiming to have all the answers about how to achieve that, but I am fairly certain punishment is not a necessary part of it.

direwolf20 · 2026-01-29T23:34:20 1769729660

I saw some people saying the internet, particularly brainrot social media, has made everyone mentally twelve years old. It feels like it could be true.

Twelve–year–olds aren't capable of dealing with responsibility or consequence.

Muromec · 2026-01-29T23:37:26 1769729846

>A brave new world that is post-truth, post-meaning, post-responsibility, and post-consequences. One where the AI's hallucinations eventually drag everyone with it and there's no other option but to hallucinate along.

That value proposition depends entirely on whether there is also an upside to all of that. Do you actually need truth, meaning, responsibility and consequences while you are tripping on acid? Do you even need to be alive and have a physical organic body for that? What if Ikari Gendo was actually right and everyone else are assholes who don't let him be with his wife.

groby_b · 2026-01-29T21:08:46 1769720926

"And when sub-agents or third-party tools are involved, liability gets even murkier."

It really doesn't. That falls straight on Governance, Risk, and Compliance. Ultimately, CISO, CFO, CEO are in the line of fire.

The article's argument happens in a vacuum of facts. The fact that a security engineer doesn't know that is depressing, but not surprising.

Muromec · 2026-01-29T23:47:07 1769730427

>The fact that a security engineer doesn't know that is depressing, but not surprising.

That's a very subtle guinea pig joke right there.

QuadmasterXLII · 2026-01-29T20:46:23 1769719583

This doesn't seem conceptually different from running

    [ $[ $RANDOM % 6] = 0 ] && rm -rf / || echo "Click"

on your employer's production server, and the liability doesn't seem murky in either case

staticassertion · 2026-01-29T20:49:52 1769719792

What if you wrote something more like:

    # terrible code, never use ty
    def cleanup(dir):
      system("rm -rf {dir}")


    def main():
        work_dir = os.env["WORK_DIR"]
        cleanup(work_dir)

and then due to a misconfiguration "$WORK_DIR" was truncated to be just "/"?

At what point is it negligent?

direwolf20 · 2026-01-29T20:51:43 1769719903

This is not hypothetical. Steam and Bumblebee did it.

extraduder_ire · 2026-01-29T20:56:15 1769720175

That was the result of an additional space in the path passed to rm, IIRC.

Though rm /$TARGET where $TARGET is blank is a common enough footgun that --preserve-root exists and is default.

extraduder_ire · 2026-01-30T15:06:28 1769785588

Found it. is was this line [0] specifically. "rm -rf /usr /lib/nvidia-current/xorg/xorg" instead of "rm -rf /usr/lib/nvidia-current/xorg/xorg", which will delete all of /usr and then fail to delete a non-existent directory at /lib/nvidia-current/xorg/xorg

0: https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/commi...

niyikiza · 2026-01-29T22:12:19 1769724739

You'd be surprised to see how often we're seeing those types of semantic attack vulnerabilities in Agent frameworks: https://niyikiza.com/posts/map-territory/

cyberax · 2026-01-29T23:04:28 1769727868

Even better, $TARGET might be "/home/user/documents and settings /bin"

extraduder_ire · 2026-01-30T15:02:10 1769785330

I believe that is what staticassertion was suggesting. / is a poor example because of --preserve-root being the default.

cyberax · 2026-01-30T17:59:01 1769795941

Not quite. The grandparent's example was missing quotes around $TARGET. Which is yet _another_ footgun.

Without quotes it becomes: `rm -Rf /home/user/documents and settings /bin`

a_t48 · 2026-01-29T22:14:09 1769724849

Bungie, too, in a similar way.

groby_b · 2026-01-29T21:01:01 1769720461

"Our tooling was defective" is not, in general, a defence against liability. Part of a companys obligations is to ensure all its processes stay within lawful lanes.

"Three months later [...] But the prompt history? Deleted. The original instruction? The analyst’s word against the logs."

One, the analysts word does not override the logs, that's the point of logs. Two, it's fairly clear the author of the fine article has never worked close to finance. A three month retention period for AI queries by an analyst is not an option.

SEC Rule 17a-4 & FINRA Rule 4511 have entered the chat.

niyikiza · 2026-01-29T21:04:50 1769720690

Agree ... retention is mandatory. The article argues you should retain authorization artifacts, not just event logs. Logs show what happened. Warrants show who signed off on what

groby_b · 2026-01-29T21:11:59 1769721119

FFIEC guidance since '21: https://www.occ.gov/news-issuances/bulletins/2021/bulletin-2...

freejazz · 2026-01-29T21:16:06 1769721366

The burden of substantiating a defense is upon the defendant and no one else.