Verifiable AI: When One Jailbreak Pulls a Model

Joe BraidwoodCo-founder & CEO

June 2026 · 8 min read

On 11–13 June 2026, a single security finding moved a frontier model off the market. According to reporting from Bloomberg, CNN, CNBC, Al Jazeera, and Fortune, the US Commerce Department ordered Anthropic to suspend access to its most capable models — Fable 5 and Mythos 5 — for all foreign nationals, inside or outside the United States, including foreign-national employees, citing national-security concerns. The reported trigger was a discovered method of jailbreaking Fable 5: a cybersecurity vulnerability. Anthropic complied, but publicly disputed the directive, warning that applying the same standard across the industry would, in its words, “essentially halt all new frontier model deployments.”

Read past the headline and a harder question surfaces — one that lands on every CTO, security reviewer, regulator, and insurer at once. When a single jailbreak can pull a model, capability and security are no longer separable. So can you prove, independently and after the fact, what your AI actually did, and which of your controls actually held? That is the question verifiable AI answers, and it is the reason OVERT 1.1 arrived the same week.

This is not a victory lap. GLACIS is not adversarial to model makers; we are the proof layer underneath everyone — operators, buyers, regulators, and the labs themselves. When the ground shifts this fast, the useful response is shared infrastructure, not commentary.

What the Fable 5 moment actually proves

The detail that matters is the mechanism, not the politics. The reported cause was a jailbreak of Fable 5 — a cybersecurity vulnerability that bent the model outside its guardrails — and that finding alone was enough to trigger a market-access suspension. Capability and security have collapsed into a single axis in the national conversation. A model is now only as deployable as its worst demonstrated failure mode.

For anyone running AI in production, the lesson generalises past one lab. Your guardrails will be tested — by red teams, by adversaries, by an auditor a year from now reconstructing an incident. The kind of attack at issue here — a jailbreak, the broad class that includes the prompt injection attack — is exactly what production controls exist to catch. The question is no longer whether your controls are good in principle. It is whether you can show, to an outside party, that a specific control was active at a specific moment, and what it did when the boundary was pushed.

Most organisations cannot show that today. They can produce policies, audit narratives, and self-reported logs. As the OVERT standard puts it: governance has always been able to say what ought to be done; it has rarely been able to prove what was. Policies and recollections record intentions. They are not evidence. The Fable 5 week made the cost of that gap concrete — and public.

Documentation describes intent. Evidence proves execution.

There is a comfortable instinct, when the regulatory temperature rises, to reach for more documentation: another policy, a longer questionnaire, a fuller dashboard. None of it is wrong. All of it describes what a system was meant to do. None of it proves what the system did at runtime, under load, the moment a jailbreak slipped through.

This is the reframe at the centre of verifiable AI. A control narrative is an assertion. A signed record that a control executed is evidence. The difference is not rhetorical — it is the difference between asking an auditor, a regulator, or an insurer to trust a claim and handing them something they can check for themselves.

OVERT closes that verification-and-security gap directly. A runtime control produces, as a by-product of doing its work, a signed record an outside party can verify — without the protected data ever leaving the operator’s environment. It is not a better log. It is evidence: tamper-evident, independently checkable, and silent about everything it need not disclose.

That last property matters more in the wake of this week, not less. The instinct to prove more must not become a new way to leak more. Verifiable AI has to demonstrate that controls held without turning every audit into a fresh disclosure of sensitive content.

Four commitments behind verifiable AI

OVERT is GLACIS’s open standard for AI attestation — published under a royalty-free patent covenant at overt.is. Version 1.1.0 released on 11 June 2026 as an additive, backward-compatible minor release: an implementation conformant to 1.0 stays conformant to 1.1 unmodified. The standard rests on four commitments worth stating plainly, because each one answers a failure mode the Fable 5 moment exposed.

Evidence, not assertion. A governed action yields a receipt a third party can check — not a claim it is asked to trust.
Containment by construction. Only cryptographic fingerprints and signatures cross the boundary. The content stays home.
Independence by structure. Whoever attests is separate from whoever is governed. Self-attestation is not independent attestation.
Measurement, not adjective. Safety is stated in intervals and sample sizes an auditor can reproduce — not in words.

The third commitment is the one this week underlined hardest. When a model can be pulled on the strength of an external finding, a vendor’s own assurances about its own controls carry less weight, not more. Independence by structure is what makes a receipt mean something to a regulator who was not in the room.

The five properties mature security operations need

A standard earns its keep by mapping to what security teams already know they are missing. OVERT specifies five properties that mature operations need and that intent-based governance cannot supply:

Trusted execution evidence — which enforcing component, in which configuration, was active when a governed action occurred.
Reliable runtime coverage accounting — what was in scope, what was excluded, and how the denominators were derived.
Tamper-evident telemetry — records not reducible to operator-controlled logs.
Independent verification of enforcement events — permit, deny, override, escalation, response.
Post-incident reconstruction without routine content disclosure — verify the event history without turning attestation into a new protected-data egress channel.

Hold the Fable 5 jailbreak against that fifth property. A jailbreak is, in the end, an enforcement event that went the wrong way. The organisations that will weather the next one are the ones that can reconstruct exactly what their controls saw and did — permit, deny, override — without re-exposing the very content the attack tried to extract. That is runtime proof: not a description of the defence, but a verifiable record of it firing.

Annex G: making cross-boundary attestation real

OVERT 1.0 named the destination. OVERT 1.1’s new normative Annex G — Supplementary Requirements — builds the road to it. Three pieces of Annex G matter most for anyone trying to operationalise verifiable AI right now.

A transport binding for cross-boundary attestation

Annex G.2 defines an HTTP transport binding for cross-boundary attestation: a specified way for a signed record to travel from the operator’s environment to an independent verifier. This is the unglamorous plumbing that makes the fourth commitment — independence by structure — actually executable across an organisational boundary, rather than an aspiration. An auditor outside your walls can receive and check enforcement evidence over a defined protocol, while the protected content stays inside them.

For a regulated deployer, this is the difference between an attestation model that works on a whiteboard and one that works between two companies that do not share a database.

Automated auditor discovery

Annex G.3 specifies an automated auditor-discovery, well-known-endpoint protocol — a standard place and method for a system to find and reach the party that will verify its enforcement events. Manual integration between every operator and every auditor does not scale to the pace of agentic deployment. A discovery protocol does. It turns independent verification from a bespoke project into a property of the infrastructure.

A schema for the ControlAction artifact

Annex G.4 adds an informative reference schema for the ControlAction artifact — the evidence object already mandated by Section 10 of the standard. The ControlAction is the unit of runtime proof: the structured, signed record that a specific control took a specific action. A shared reference schema means a permit or a deny generated in one environment is legible to a verifier in another. Evidence only travels if both ends agree on its shape.

Annex G also adds G.1 — local content-addressed storage for evidence retrieval and retention integrity — so that the records remain checkable over the retention windows regulators and insurers actually care about.

What else 1.1 settled — and what it deliberately left alone

The rest of the changelog is the sound of a standard maturing carefully rather than expanding loudly. Regulatory dates were refreshed for Colorado and the EU. Crosswalks moved to an informative companion document, and the normative Attestation Boundary Declaration was renumbered from 29.4 to 22.10. The scanner and the local classifier are now defined as supporting components. Governance language was calibrated, and a non-normative post-quantum note was added.

Two additions deserve a security reviewer’s attention. First, a versioning and errata policy (Section 22.11): MAJOR.MINOR.PATCH, with control identifiers — handles such as ATT-3.5 or GOV-5.6 — held stable across a major version rather than renumbered underneath you. For anyone building an AI security standard into a long-lived compliance programme, stable identifiers are not a footnote; they are what lets your evidence from this year still resolve next year. Second, Parts 1–22 conformance levels and existing obligations are otherwise unchanged from 1.0. An additive release that does not move the ground under existing implementers is itself a kind of proof — that the standard is built to be depended on.

Why this is shared infrastructure, not a pitch

It would be easy to read the Fable 5 suspension as a story about one lab and one bad week. It is better read as a preview. The forces that pulled a model this week — a single security finding, a regulator with the leverage to act on it, and no neutral way to demonstrate what controls were in place — are not unique to frontier labs. They are converging on every organisation that lets AI take consequential action.

GLACIS exists to make that demonstration possible. The motion is runtime coverage: controls enforce at the inference, tool-call, and agent boundary; signed OVERT receipts prove what ran; and only hashes, signatures, and verification metadata cross the boundary. We are the proof, not another dashboard describing intent. And OVERT is an open standard precisely because proof only works when everyone can check it — which means it cannot belong to any one vendor, including us.

The honest framing of this moment is not we told you so. It is here is the road we have all been needing. When capability and security become one axis, the deciding question for operators, buyers, regulators, and insurers alike is the same: can you prove what your AI did, and which controls held? Verifiable AI is how you answer it before someone else asks.

The takeaway

A jailbreak pulled a model this week. The next finding — yours, or a vendor’s, or one nobody has named yet — will arrive with the same question attached. Documentation will describe your good intentions. Receipts will prove your controls fired. The gap between those two is exactly the gap OVERT 1.1, and Annex G in particular, was built to close.

If you run AI that acts, the practical next step is to put runtime proof underneath it before you need it. Get runtime coverage — and see what independent, after-the-fact evidence actually looks like.

Navigate

Solutions

Evidence

Regulations

When One Jailbreak Pulls a Model: The Case for Verifiable AI