Why AI governance assessments need runtime evidence

An AI governance assessment without runtime evidence is just an opinion delivered with confidence. Most produce a maturity score or a tidy checklist of controls—and stop there. They don’t prove that any of those controls actually run when the model takes action.

The assessment market is stuck on policy

Walk through ten AI governance assessments today and you’ll see the same shape. A vendor or consultant interviews the team, reviews the policy binder, maps controls to NIST AI RMF or ISO/IEC 42001, and hands back a maturity rating. Sometimes there’s a heat map. Sometimes a remediation plan. Almost never is there evidence that the controls in the binder actually fired the last time the AI did something consequential.

That worked when AI sat behind a chat window and a human approved every action. It doesn’t hold up when the AI is autonomously routing tickets, scheduling appointments, drafting clinical notes, or executing trades. Once the model acts, “we have a policy that says the model shouldn’t do X” isn’t the same kind of statement as “here is the signed receipt showing the X-blocking control ran on this transaction at this timestamp.”

The next question reviewers are asking

We sit in a lot of enterprise security reviews. The question that’s changed in the last twelve months isn’t “do you have an AI policy?” or “are you ISO 42001 certified?” Those are table stakes now. The new question, asked in plain language by reviewers, regulators, auditors, and increasingly by boards, is some version of:

“Show me the evidence the controls fired.”

Not the policy. Not the architecture diagram. The evidence—per transaction, per agent action, per high-risk decision—that the prompt was inspected, the output was checked, the PII was redacted, the human reviewed the recommendation, the model didn’t exfiltrate data. A maturity score doesn’t answer that question. A signed runtime receipt does.

What an evidence-grade assessment produces

The next generation of AI governance assessment has to produce two artifacts, not one:

The first artifact is what most assessments produce today. The second is what closes deals, satisfies regulators, and survives the first incident. Without it, the hardening plan is a promise. With it, the hardening plan is a contract the system enforces on itself every time it acts.

A worked example: the Sprint we run

We built the Glacis Agent Runtime Security & Evidence Sprint to make this concrete on a single workflow rather than across an entire organization. It’s a paid, fixed-scope engagement: one named AI workflow, ten business days. The output is three artifacts the customer can hand to their next enterprise reviewer:

The Sprint runs inside the customer’s infrastructure. There’s zero sensitive-data egress—the local runtime controls and the evidence packs they emit stay where the data lives. We don’t centralize prompts, outputs, or logs. The artifact the customer hands to a reviewer is one they own, signed by their own infrastructure, not by us.

If your next assessment doesn’t end in a receipt

The test is simple. After the assessment is done, can you answer the “show me the evidence the controls fired” question for one specific AI action that happened last week? If the answer is “we have a policy that says we would have caught that,” the assessment hasn’t finished its job—it’s described the world as you’d like it to be, not as your runtime actually behaves.

Maturity scores are useful as a starting point. Hardening plans are useful as a roadmap. Neither is sufficient on its own anymore. The bar that matters is whether the assessment leaves you with a credible path to runtime evidence on the workflows where AI is making decisions that affect people, money, or care.

For healthcare AI vendors: HIPAA’s technical safeguards include audit controls under 45 C.F.R. § 164.312(b). The standard doesn’t describe a maturity rating—it describes records of activity. A governance assessment that maps to HIPAA without producing per-action evidence is mapping to the wrong half of the rule.

Primary sources

Pango waving

Turn your governance plan into runtime evidence

The Agent Runtime Security & Evidence Sprint takes one high-risk AI workflow, hardens the runtime locally, and produces signed evidence receipts for enterprise review—in 10 business days.

Book the Sprint

Related guides

NIST AI Risk Management Framework

Complete implementation guide

ISO 42001 Guide

AI management system standard

The Proof Gap

Why compliance claims aren't enough

Ready to make your AI auditable?

Talk to our team. 30 minutes. One named workflow. Decide if the next 10 days save you a quarter.