Agentic AI Security Needs Proof, Not Promises

Joe BraidwoodCo-founder & CEO

June 2026 · 6 min read

Agentic AI security has a structural problem most teams discover only after something goes wrong. As an agent plans, calls tools, and acts across systems with limited human review, the only account of what it did is the account the system gives of itself. Self-reported logs are most trustworthy when nothing is at stake — and least trustworthy at exactly the moment they matter most, when you need to reconstruct an incident and the system that misbehaved is also the system writing the record. The bar for agentic AI security cannot be a vendor's assurance that controls were in place. It has to be proof.

That distinction stopped being academic in June 2026. According to reporting from Bloomberg and CNBC, the US Commerce Department ordered Anthropic to suspend access to its most capable models for all foreign nationals, with the reported trigger a discovered method of jailbreaking one of them — a cybersecurity vulnerability. A single jailbreak finding was enough to pull a frontier model off the market. Capability and security are now inseparable in the national conversation, and the question that follows lands on everyone downstream: can you prove, independently and after the fact, what your AI actually did and which controls actually held?

Why self-reported logs fail when agents act

Traditional AI agent security inherits an assumption from a slower era — that the system under scrutiny can be trusted to narrate its own behavior. For a static model behind an API, that assumption is merely optimistic. For an autonomous agent, it breaks.

An agent chains decisions. It reads a document, decides a tool call is warranted, retrieves more context, and acts — often several hops deep before a human sees anything. Each hop is a place where a control should fire: an input filter, a permission check, an escalation gate. A conventional log records that these ran. It does not prove it. The log is produced by the same operator-controlled environment as the action, so it can be incomplete, reordered, or rewritten — not necessarily through bad faith, but because nothing structural prevents it. When a regulator, an insurer, or your own incident team asks what happened, "here is our log" is a recollection, not evidence.

This is the verification gap that OVERT, the open standard GLACIS maintains, was written to close. Its framing is blunt: governance has always been able to say what ought to be done, and has rarely been able to prove what was. Policies and self-reported logs record intentions and recollections; they are not evidence. Agentic systems make that gap operational, because the distance between intent and action is now traversed by software, fast, and mostly unwatched.

Agentic AI security has to be witnessed

If a system cannot be trusted to grade its own homework, the grader has to sit outside the system. That is the core move for agentic AI security: the party that attests must be separate from the party being governed. Self-attestation is not independent attestation. A signed claim an operator generates about its own behavior carries exactly the weight of the operator's word — useful internally, inert in front of anyone who needs assurance they did not write themselves.

OVERT calls this independence by structure. The point is not that operators are dishonest; it is that "trust me" cannot be the foundation of safety for systems that act autonomously at machine speed. Independence is a property of the architecture, not a quality of the people running it. Whoever witnesses enforcement is structurally distinct from whoever is enforced upon, so the proof survives a change of staff, a change of incentives, and the worst possible day.

What gets witnessed is the enforcement event itself — the permit, the deny, the override, the escalation, the response. For an agent, those are the moments that decide whether autonomy stayed inside its lane. Independent verification of those events is what turns "the agent behaved" from a hope into a checkable fact.

Trusted execution evidence: which control, which configuration

Knowing a control existed is not the same as knowing it was active when a specific action occurred. A policy can be enabled in the dashboard and bypassed at the boundary; a guardrail can be configured and then silently disabled in the deployment that actually shipped.

Trusted execution evidence answers the precise question: which enforcing component, in which configuration, was active at the moment a governed action took place. For agentic systems that is the difference between "we have an input filter" and "this filter, with this ruleset, evaluated this tool call at this point in the chain, and here is the signed record." It is the part of agentic AI security that a jailbreak makes urgent — because the live question after any bypass is not whether a defense was purchased, but whether it was running and what it did when the adversarial input arrived.

Proof without turning your data into an exhaust pipe

The obvious objection to "prove everything" is that proof usually means disclosure, and disclosure of agent traffic — prompts, retrieved records, tool payloads — is its own breach waiting to happen. Regulated operators in healthcare, financial services, and defense cannot make attestation a new channel through which sensitive content leaks.

OVERT resolves this with containment by construction. Only cryptographic fingerprints and signatures cross the boundary; the protected content stays in the operator's environment. A governed action produces, as a by-product of doing its work, a signed record an outside party can verify — without that party ever seeing the underlying data. The receipt is tamper-evident and independently checkable, and it is silent about everything it need not disclose. You get proof an enforcement event happened and held, and the content that event touched never leaves home.

That property is what makes post-incident reconstruction safe. After an agent does something unexpected, you need to verify the event history — which controls fired, in what order, with what outcome. Containment by construction lets you do that reconstruction without turning the audit into a fresh egress of protected data. The proof travels; the secrets do not.

Runtime coverage is the metric, not coverage on a slide

A control that protects some agent actions and quietly misses others is not really protecting anything you can stand behind. This is why agentic AI security has to be measured as runtime coverage — and why the denominators matter as much as the events.

Reliable coverage accounting means stating what was in scope, what was excluded, and how the denominators were derived. An agent that makes a thousand tool calls and routes nine hundred through enforcement has a coverage gap, and a vague "we cover tool calls" hides exactly the hundred you would lose sleep over. OVERT's discipline here is measurement, not adjective: safety expressed in intervals and sample sizes an auditor can reproduce, not in words like "robust" or "comprehensive." For an autonomous system, runtime coverage stated honestly is the difference between a security posture and a marketing claim.

The bar has moved — meet it with evidence

The frontier-model suspension was a signal, not an anomaly. When a single security finding can halt a model's deployment, "we have controls" stops being a sufficient answer for anyone operating AI that acts — to regulators, to insurers, to your own board. The defensible position is the one you can demonstrate after the fact, with proof you did not author: independent attestation, tamper-evident receipts, runtime coverage you can put a number on.

GLACIS is the runtime proof layer beneath agentic systems — controls that enforce at the inference, tool-call, and agent boundary, and signed OVERT receipts that prove what ran, with no sensitive data crossing the line. If your agents are starting to act on their own, the time to make them witnessed is before the incident, not during it.

Get runtime coverage — or verify a receipt to see what independent proof looks like.

Navigate

Solutions

Evidence

Regulations