AI Agent Security: Prove What the Agent Did

Joe BraidwoodCo-founder & CEO

June 2026 · 6 min read

An agent is not a chatbot. It holds credentials, calls tools, moves money, writes to systems of record, and chains those actions together faster than any human can review. That changes what AI agent security has to mean. The question a CTO or security reviewer has to answer is no longer “did we write a good policy?” — it is “can we prove, independently and after the fact, what the agent actually did, and which controls held when it did it?”

Most of the stack answers the first question. Governance platforms describe intended behavior. Posture tools score configuration. Self-reported logs record what the operator's own software chose to write down. All of that states intent. None of it is evidence. AI agent security needs both — controls that enforce at the moment of action, and proof of enforcement that an outside party can check without taking your word for it.

This guide reframes agentic AI security around that gap, and shows where signed receipts fit underneath the governance tools you already run.

Why agent security is a runtime problem, not a document problem

The model layer is now a live attack surface. In mid-June 2026, the US Commerce Department ordered Anthropic to suspend access to its Fable 5 and Mythos 5 models for all foreign nationals over national-security concerns. Bloomberg, CNN, CNBC, and others reported that the trigger was a discovered jailbreak of Fable 5 — a cybersecurity vulnerability. Anthropic complied while disputing the order, warning that the standard behind it would, in its words, “essentially halt all new frontier model deployments.”

Set aside the policy fight. The operational lesson stands on its own: one jailbreak can now move a frontier model in or out of your reach. Capability and security are no longer separable concerns you can stage. The model you build on can shift underneath you, and the agent wrapped around it can be steered by an input you never anticipated.

When that happens, a policy document tells you what was supposed to occur. It cannot tell you what did. For an autonomous system acting through tools, the distinction is the whole game.

Intent versus evidence

Governance has always been able to say what ought to be done. It has rarely been able to prove what was. Policies, audit narratives, and self-reported logs record intentions and recollections — they are not evidence. A log your own agent writes is a claim about your agent, authored by your agent. If the agent is compromised, so is the log.

That is the reframe at the center of modern AI agent security. Dashboards and policies assert. Evidence proves. The two are not interchangeable, and conflating them is how a deployment passes every review and still cannot say what happened during the one incident that mattered.

The action boundary is where security has to live

For an agent, the decisive event is the tool call — the instant it tries to send the email, run the query, hit the payment API, or invoke another agent. Tool-call security means a control sits at that boundary and decides: permit, deny, override, or escalate. Enforcement that happens anywhere else is advisory.

Three things have to be true at that boundary for AI agent runtime security to hold:

Enforcement is in the path, not beside it. The control mediates the action. It does not observe it after the fact and file a report.
Every decision is an event. Permit, deny, override, escalation, response — each is recorded as a discrete, attributable enforcement event, not a line buried in application logs.
The record survives the agent. The evidence of what the control did is tamper-evident and not reducible to logs the operator (or a compromised agent) can quietly rewrite.

This is the layer governance tools assume exists but rarely produce. They define the policy. Something still has to enforce it at the tool call and prove the enforcement happened.

Signed receipts: proof the agent can't forge

This is where OVERT comes in. OVERT is GLACIS's open standard — released royalty-free under a patent covenant — for independent, tamper-evident proof that runtime controls executed without protected content ever leaving the operator's environment. Version 1.1.0 shipped on 11 June 2026 as a backward-compatible, additive minor over 1.0.

OVERT is not a better log. It is evidence, built on four commitments:

Evidence, not assertion. The artifact proves what ran; it does not narrate what was intended.
Containment by construction. Only cryptographic fingerprints and signatures cross the trust boundary. The content — prompts, data, the body of the tool call — stays home.
Independence by structure. The party that attests is separate from the party being governed. A system does not vouch for itself.
Measurement, not adjectives. Coverage is reported as intervals and sample sizes an auditor can reproduce, not as the word “comprehensive.”

In practice, each enforcement decision at the action boundary emits a signed receipt. The receipt records that a control ran and what it decided, in a form an independent verifier can check later — including reconstructing an incident after the fact without routine disclosure of the underlying content. Coverage accounting travels with it: scope, exclusions, and the denominators behind any percentage you claim. That is the difference between “we have controls” and “here is the proof this specific action was governed.”

What changed in OVERT 1.1

Version 1.1 is additive, and the agent-relevant work is concentrated in a new normative Annex G. G.1 specifies local content-addressed evidence retrieval and retention integrity. G.2 defines the HTTP transport binding for cross-boundary attestation — the case where one agent's action depends on another service's. G.3 adds an automated auditor-discovery protocol via a well-known endpoint, so a verifier can find what it needs to check. G.4 supplies an informative reference schema for the ControlAction artifact.

Alongside the annex, regulatory dates were refreshed for Colorado and the EU, framework crosswalks moved to a companion document, and the Attestation Boundary Declaration was renumbered from 29.4 to 22.10. The scanner and the local classifier are now defined as supporting components. A versioning and errata policy lives in Section 22.11, with stable control identifiers such as ATT-3.5 and GOV-5.6 so a citation stays valid across releases. Conformance for Parts 1 through 22 is unchanged from 1.0 — existing obligations still hold.

How this sits with your governance stack

None of this displaces the governance, risk, and compliance tools you run. It sits underneath them. Your GRC platform states what the agent should do. OVERT receipts prove what it did. The relationship is asserted-to-proven: the policy layer declares the control; the proof layer makes that declaration checkable by someone who was not in the room.

For agentic AI security specifically, that pairing is what lets you keep deploying through a moment like June 2026. When the model shifts, when an input slips past, when a regulator or a customer asks what actually happened — you answer with evidence that was generated at the action boundary, signed at the time, and verifiable without exposing a single sensitive payload.

That is the shape of AI agent security worth building: controls that enforce where the agent acts, and receipts that prove it — independent of the agent, silent about what they need not disclose.

If you're deploying agents into regulated or high-stakes environments, get runtime coverage and see what signed enforcement looks like against your own tool calls. You can also verify a receipt yourself, or read the standard at overt.is.

Navigate

Solutions

Evidence

Regulations