Your AI Governance Documentation Isn’t Evidence

Joe BraidwoodCo-founder & CEO

June 2026 · 6 min read

Governance has always been able to say what ought to be done. It has rarely been able to prove what was. That gap is the quiet problem sitting underneath most AI governance documentation today — the policies, control narratives, and self-reported logs that describe how a system is supposed to behave, without ever showing what it actually did.

If you own that documentation, none of this is a failing on your part. You built the right artifacts. The standards asked for policies and you wrote them; the auditors asked for narratives and you produced them. The trouble is structural, not personal: a document records an intention or a recollection. It is not a record an outside party can independently check. And as AI systems begin to act — calling tools, making decisions, taking irreversible steps at runtime — the distance between we have a control and here is proof the control executed becomes the thing that matters most.

This piece is about closing that distance. Not by writing more documentation, but by producing evidence as a by-product of the controls already doing their work.

What AI governance documentation actually captures

Walk through a typical AI governance binder and you will find three kinds of artifact, each useful, none of them evidence in the strict sense.

Policies state intent. Our models will not process protected data without authorisation. High-risk outputs require human review. These are commitments — necessary, but a commitment is a statement about the future, not a record of the past.

Audit narratives describe process. They explain how a control is meant to operate and how the organisation believes it operated over a period. They are written after the fact, by the party being assessed, and they summarise rather than show. An auditor reading a narrative is reading your account of events, then deciding how much of it to trust.

Self-reported logs record activity — but they are produced by, and editable within, the same environment they are meant to vouch for. A log that the operator can write can also be amended, truncated, or selectively retained. That is not an accusation; it is a property of the architecture. Telemetry that reduces to operator-controlled logs cannot, by construction, serve as independent proof.

Each of these answers the question what was supposed to happen? None of them answers the harder one: can you prove, independently and after the fact, what your AI actually did and which controls actually held?

Intent versus evidence

The shorthand worth keeping: documentation is intent, evidence is proof.

A policy says the guardrail exists. A narrative says it generally worked. A log says something happened, in a file you control. Evidence is different in kind — it is tamper-evident, it is checkable by someone who does not have to take your word for it, and it is silent about everything it need not disclose. The shift from the first three to the fourth is the whole game.

Why a stronger AI governance framework still leaves the gap open

It is tempting to treat this as a maturity problem — adopt a more rigorous AI governance framework, add more controls, and the gap closes. It does not, because most frameworks specify what to govern and what to document, not how to prove execution to an outsider.

You can map every requirement in a recognised framework, generate immaculate documentation for each, and still be unable to demonstrate, on the day an incident lands, which enforcing component was active, what configuration it ran under, and whether a given action was permitted, denied, overridden, or escalated. The framework told you to have the control. It did not give you an artifact a third party could verify without you in the room narrating.

This is the part that catches careful teams off guard. The documentation is complete. The audit went well. And then something goes wrong at runtime — a jailbreak, an unexpected tool call, an output that should have been blocked — and the question becomes evidentiary, not procedural. Show me what ran. At that moment, a binder of intentions and a log you maintain are not enough, through no fault of the people who assembled them.

The five things mature operations actually need

When a security or governance team moves past describing controls toward proving them, a consistent set of needs surfaces. Real evidence has to deliver:

Trusted execution evidence — which enforcing component, in which configuration, was active when a governed action occurred.
Reliable coverage accounting — what was in scope, what was excluded, and how the denominators were derived. Safety expressed as a fraction an auditor can reconstruct, not as an adjective.
Tamper-evident telemetry — records not reducible to operator-controlled logs.
Independent verification of enforcement events — permit, deny, override, escalation, response — checkable by a party separate from the one being governed.
Post-incident reconstruction without routine content disclosure — the ability to verify event history after the fact without turning the evidence trail into a new channel that leaks protected data.

Notice that not one of these is satisfied by a document. They are properties of a runtime artifact, produced when the control fires, structured so an outsider can check it.

What evidence looks like: the OVERT receipt

This is the gap OVERT was written to close. OVERT is GLACIS’s open standard — a royalty-free patent covenant — for producing independent, tamper-evident proof that runtime controls executed, without protected content ever leaving the operator’s environment.

The mechanism is deliberately undramatic. A governed action runs. As a by-product of doing its work, the control emits a signed record — a receipt — that an outside party can verify. Only cryptographic fingerprints, signatures, and verification metadata cross the boundary. The underlying content stays home. It is not a better log. It is evidence: tamper-evident, independently checkable, and silent about everything it need not disclose.

Four commitments hold the standard together, and each one maps directly onto a weakness in conventional documentation:

Evidence, not assertion. A governed action yields a receipt a third party can check — not a claim it is asked to trust. This is the line between a narrative and a fact.
Containment by construction. Only fingerprints and signatures cross the boundary; the content stays in the operator’s environment. Proof does not require exposure.
Independence by structure. Whoever attests is separate from whoever is governed. Self-attestation is not independent attestation — the same reason a self-reported log can never be the final word.
Measurement, not adjective. Safety stated in intervals and sample sizes an auditor can reproduce, rather than in words like robust or comprehensive.

That third commitment is where AI attestation stops being a synonym for we logged it. Attestation, in the OVERT sense, is the act of an independent party verifying that an enforcement event occurred — not the operator restating it. A receipt you can hand to an auditor, a regulator, or a customer’s security reviewer, who can then check it themselves, is a categorically different object from a record only you can read and only you can vouch for.

Documentation still has a job

This is not an argument against AI governance documentation. Policies still set the intent that controls enforce. Frameworks still organise the work. Narratives still explain context that a cryptographic receipt, by design, does not. The point is narrower and more durable: documentation describes the system; evidence proves the run. You want both, and you want to stop being asked to substitute the first for the second.

The healthiest posture treats your existing documentation as the map and OVERT receipts as the territory — the map says where the controls should be, the receipts prove they were there and held when it counted.

Where to start

You do not have to rebuild your governance programme to close the evidence gap. The move is additive: keep the documentation, and add a layer that produces verifiable proof at the points where your AI actually acts — the inference call, the tool call, the agent boundary.

Start by asking one question of any control you currently document: if this were challenged tomorrow, what could I hand an independent party to prove it executed — without disclosing the underlying data? If the honest answer is a policy and a log I control, that is the gap, and it is a solvable one.

The standard is public at overt.is and at /standard; you can verify a receipt yourself to see what tamper-evident proof looks like in practice. When you are ready to turn your documented controls into evidence that holds at runtime, get runtime coverage — and let the proof speak for itself.

Navigate

Solutions

Evidence

Regulations