Runtime proof · OVERT
An AI Governance Maturity Model: From Policy to Proof
An AI governance maturity model that stops at “documented” measures intent, not evidence. Reframe maturity from policy to verifiable proof at runtime.
Most organizations rate their AI governance maturity by how much they have written down. That is the wrong yardstick. An AI governance maturity model that stops at “documented” measures intent, not evidence — and intent is where most programs quietly stall. This guide proposes a different top stage: not policy on a page, but proof at runtime. The reframe matters because governance has always been able to say what ought to be done; it has rarely been able to prove what was. Policies, audit narratives, and self-reported logs record intentions and recollections. They are not evidence.
The gap is widening as AI systems begin to act — calling tools, invoking other agents, and making decisions at the inference boundary. A control you described in a policy and a control that actually executed against live traffic are two very different things. The distance between them is exactly what a proof-first maturity model is built to close.
Why most AI governance maturity self-assessments are inflated
Ask a governance leader to score their program and the number usually comes back high. The reason is structural, not vanity: most AI governance maturity rubrics reward artifacts. A documented model inventory, an approved AI policy, a completed risk assessment, a signed-off architecture diagram — each earns a tick, and the ticks add up to a flattering score.
The problem is that every one of those artifacts is an assertion. It states that something was decided or built. It does not demonstrate that the control fired when a real prompt arrived, that the redaction step ran before data left the environment, or that an override was logged the moment a human took control. A program can be richly documented and operationally blind at the same time.
This is the trap that makes ai assurance maturity look further along than it is. Documentation maturity and proof maturity are different axes. You can max out the first while sitting near zero on the second — and you will not know, because the instruments you used to grade yourself only ever looked at paper.
A maturity model with five stages
A useful AI governance maturity model treats each stage as a strictly higher bar of demonstrability — not a longer binder. Here are five stages, from least to most demonstrable.
Stage 1 — Ad hoc
There is no written policy. AI use is informal and uninventoried. Decisions about acceptable use, data handling, and model selection live in people’s heads. Most organizations have already left this stage, at least on paper.
Stage 2 — Policy
A formal AI policy exists. It names principles, prohibited uses, and accountable owners. This is the floor that frameworks like the NIST AI Risk Management Framework and ISO/IEC 42001 help establish, and it is real progress. But policy describes intent. Nothing here proves that the stated rules touch production behavior.
Stage 3 — Controls defined
Policy is translated into specific, named controls: input filtering, output classification, access scoping, human-in-the-loop checkpoints, escalation paths. Each control has an owner and a description. This is where many mature-looking programs actually sit. The controls are designed — but design is a claim about the future, not a record of the past.
Stage 4 — Controls enforced at runtime
The controls now execute at the inference, tool-call, and agent boundary. A blocked prompt is actually blocked. A redaction step actually runs. An escalation actually pages a human. This is a genuine leap: behavior, not intention. Yet enforcement alone still leans on the operator’s own logs to say what happened — and a system grading its own homework is not independent evidence. An auditor reading those logs is trusting the same party that ran the controls.
Stage 5 — Independently verifiable evidence
The top of the model is not “more enforcement.” It is proof. At this stage, every enforcement event — permit, deny, override, escalation, response — produces tamper-evident telemetry that an outside party can check without trusting the operator’s narrative, and without protected content leaving the operator’s environment. This is responsible ai maturity expressed as evidence rather than as adjective. You no longer say the control held. You can show it held, and someone who does not work for you can confirm it.
Most organizations that believe they are at Stage 4 or 5 are honestly at Stage 3. They have defined excellent controls and assume design implies execution. The whole point of grading on proof is to make that assumption visible.
What the top stage actually requires
The jump from Stage 4 to Stage 5 is the hard one, and it is where an open standard earns its place. OVERT is GLACIS’s open standard — royalty-free, under a patent covenant — for independent, tamper-evident proof that runtime controls executed without protected content leaving the operator’s environment. It does not replace your policy, your risk framework, or your governance platform. It sits underneath them and turns their assertions into something checkable.
OVERT is organized around four commitments that map almost exactly onto the failure modes of lower maturity stages:
- Evidence, not assertion. The artifact demonstrates what happened; it does not narrate it.
- Containment by construction. Only fingerprints and signatures cross the boundary — the content itself stays home. This is what lets a regulated deployment prove enforcement without exporting sensitive data.
- Independence by structure. The attester is separate from the governed system, so the proof is not the operator vouching for the operator.
- Measurement, not adjective. Coverage is expressed as intervals and sample sizes an auditor can reproduce — not as “robust” or “comprehensive.”
In practice this means a Stage 5 program can produce signed OVERT receipts: tamper-evident records that show which controls ran, account for what was in and out of scope, and let an independent verifier confirm enforcement events after the fact — including reconstructing an incident without routinely disclosing the underlying content.
The most recent version, OVERT 1.1.0, was released on 11 June 2026 as a backward-compatible, additive update to 1.0. It adds a normative Annex G covering local evidence retrieval and retention integrity, a transport binding for cross-boundary attestation, an automated auditor-discovery endpoint protocol, and a reference schema for the control-action artifact. The conformance requirements that define the standard’s core stayed unchanged from 1.0, so a program built against the standard does not have to relitigate its foundation to adopt the new capabilities.
How to use this model
Score yourself twice. First grade your documentation maturity — policies, defined controls, completed assessments. Then grade your proof maturity — what you could hand an independent auditor that does not rely on your own word. The gap between the two scores is your real backlog.
Then work the gap one control at a time. Pick the controls that matter most if they silently fail — the redaction step before data egress, the human checkpoint on a high-risk action — and move them from “defined” to “enforced” to “independently provable.” Maturity is not the length of your policy library. It is how little an outsider has to take on faith.
If you want to see what Stage 5 looks like in the concrete, you can verify a sample OVERT receipt or get runtime coverage for a system you already run — proof you can hand to a reviewer, not another dashboard to read. The standard itself is open and published at overt.is.