Runtime proof · OVERT
AI Governance Challenges: The Challenge No One Names
Most AI governance challenges are problems of intent. The hardest one is a problem of execution: proving what your AI systems actually did.
Most lists of AI governance challenges read the same way: the technology moves faster than the policy, the scope keeps expanding, and nobody can get every stakeholder to agree on what “good” looks like. All true. All worth solving. But there is a harder, quieter challenge underneath them — one that survives even after you’ve fixed the others. You can write the policy, align the room, and define the scope, and still not be able to prove, independently and after the fact, what your AI systems actually did and which controls actually held. That is the verification gap, and it is the structural problem the rest of the work eventually runs into.
This piece walks through the familiar AI governance challenges quickly, then spends its time on the one that doesn’t get named — and on the capability that closes it.
The AI governance challenges everyone already names
Three obstacles show up in nearly every governance program, and they’re real.
Pace. Models, agents, and integrations change faster than review cycles. A control reviewed in the spring may sit in front of a system that has been retrained, re-prompted, or re-scoped by the summer. Governance written as a static document ages the moment the system underneath it moves.
Scope. “Govern the AI” sounds bounded until you try to draw the line. Is it the model, the prompt, the retrieval layer, the tool calls, the agent that chains them together? Each boundary you don’t account for is a boundary no one is watching.
Stakeholder alignment. AI governance stakeholders — legal, security, data, the business owner, the eventual auditor or regulator — arrive with different definitions of risk and different evidence they’ll accept. Getting them into one framework is genuine work, and it’s the part most program leads spend their energy on.
These are solvable. Teams solve versions of them every quarter. The trouble is that solving all three still leaves you somewhere uncomfortable.
The AI governance challenge no one names: the verification gap
Here is the thing the familiar list quietly assumes. It assumes that once you’ve written the right policy and aligned the right people, the system is governed. But a policy describes what ought to happen. It is not a record of what did.
As the OVERT standard puts it: governance has always been able to say what ought to be done; it has rarely been able to prove what was. Policies, audit narratives, and self-reported logs record intentions and recollections — they are not evidence.
That distinction is the whole game. Most AI governance challenges are challenges of intent: deciding the rules, agreeing on the boundaries, getting the stakeholders to sign. The verification gap is a challenge of execution: when a governed action actually occurred — a model call, a tool invocation, an agent decision to permit or deny — can an outside party confirm which control was in force and that it held? In most programs today, the honest answer is no. The proof on offer is the operator’s own logs and the operator’s own word.
Why self-attestation isn’t attestation
The reason the gap persists is that the prevailing form of proof is self-attestation, and self-attestation has a structural flaw that no amount of effort fixes: the party making the claim is the party being judged.
When a system reports its own compliance, the evidence is only as trustworthy as the entity that produced it — and that entity has every reason to want a clean result. This isn’t an accusation of bad faith. It’s a statement about structure. An auditor reviewing a self-generated log is reviewing a record the audited party controlled end to end: it could have been edited, filtered, or quietly reconstructed after the fact, and nothing in the log itself rules that out. Independence isn’t a tone you adopt. It’s a property you either have or don’t, and you don’t have it when the witness and the governed are the same system.
So the deepest of the AI governance challenges isn’t that intent is hard to set. It’s that intent can’t be audited as if it were execution — and self-reported execution can’t be audited as if it were independent.
What actually closes the gap
Naming the problem is the easy half. The harder claim is that it’s fixable — that you can produce proof of what a runtime control did without inheriting the weaknesses of a self-generated log, and without turning your evidence trail into a new way for sensitive data to leak.
That’s the capability the OVERT standard is built around. The shape of it rests on four commitments.
- Evidence, not assertion. A governed action yields a receipt a third party can check — not a claim it’s asked to trust. The point of proof is that it doesn’t require faith.
- Containment by construction. Only cryptographic fingerprints and signatures cross the boundary; the protected content stays in your environment. Verification that forces you to export the data you’re protecting isn’t a control, it’s a leak with paperwork.
- Independence by structure. Whoever attests is separate from whoever is governed. Self-attestation is not independent attestation — that has to be built into the architecture, not promised in a policy.
- Measurement, not adjective. Safety stated in intervals and sample sizes an auditor can reproduce, rather than reassuring words. “Robust” is an adjective; a denominator is evidence.
Put together, these turn the control itself into the source of proof. A runtime control produces, as a by-product of doing its work, a signed record that an outside party can verify. It’s tamper-evident — you can tell if it’s been altered. It’s independently checkable — its trustworthiness doesn’t depend on trusting the operator. And it’s silent about everything it needn’t disclose, so the act of proving compliance doesn’t open a fresh data-egress channel.
Runtime evidence is the missing layer
This is why the verification gap is best understood as a runtime problem, not a documentation one. The questions that actually matter to a security reviewer or regulator are runtime questions: which enforcing component and configuration were active when a governed action occurred; what was in scope and what was excluded; whether the telemetry is reducible to operator-controlled logs or stands on its own; whether enforcement events — permit, deny, override, escalation — can be independently verified; and whether you can reconstruct an incident afterward without routinely disclosing the underlying content.
Runtime evidence answers those. Documentation gestures at them. The difference between the two is the difference between intending a control and proving it ran.
What this changes for governance leads
If you lead an AI governance program, the reframe is small but consequential. The familiar challenges — pace, scope, stakeholder alignment — are worth working, and you should keep working them. But treat the verification gap as the load-bearing one, because it’s the challenge that determines whether everything else holds up under scrutiny.
It also resolves the stakeholder problem from a surprising angle. The reason AI governance stakeholders argue about evidence is that, until now, the only evidence available was contestable — a log someone could dispute, a narrative someone could discount. Independent, tamper-evident, containment-by-construction proof gives every stakeholder the same artifact to point at: legal, security, the business owner, and the eventual auditor can all check the same receipt and get the same answer. Alignment gets easier when the evidence stops being a matter of opinion.
The shift, in one line: stop trying to make your documentation more convincing, and start making your runtime controls produce proof. Intent is necessary. It was never sufficient.
If you want to see what independently checkable proof looks like in practice, verify a sample OVERT receipt — or, when you’re ready to put runtime evidence under your own AI systems, get runtime coverage.