Back to Blog
AI Safety

Voluntary AI Safety Just Died. Here’s What Replaces It.

Joe Braidwood
Joe Braidwood
Co-founder & CEO
· February 2026 · 8 min read

The signal event: Anthropic did not scrap its Responsible Scaling Policy outright. It rewrote it into a more flexible framework. That matters because even the most prominent voluntary AI safety regime can still be adjusted internally, on the company’s timetable, without any independent enforcement mechanism.

Why Voluntary Safety Keeps Moving

Anthropic’s RSP remains one of the clearest public examples of frontier-model self-governance: published thresholds, escalation language, and periodic updates. But those updates also illustrate the core limitation of voluntary commitments. The framework evolves when the company decides it should evolve.

The issue is not whether Anthropic kept the acronym alive. It is that the safeguards remain self-policed and internally adjustable.

That is the broader pattern worth paying attention to. Voluntary AI commitments are authored by the same organizations whose products, timelines, and commercial incentives they are meant to constrain. They can still be thoughtful and useful. They are just not the same thing as independently verifiable proof.

When a framework can be updated, narrowed, or reinterpreted inside the same organization, buyers and regulators still need a separate way to verify what controls actually ran in production.

The Failure Is Architectural, Not Moral

The instinct is to blame the companies. To say they lacked courage, or conviction, or integrity. But that misses the structural point entirely.

Every voluntary AI safety framework shares the same structural weakness: the organization making the promise is usually the organization interpreting whether it kept the promise. Unless there is independent auditability, third-party review, or verifiable operational evidence, the framework remains largely self-attested.

“You can’t verify a promise. You can only verify evidence. That’s the architectural flaw in every voluntary AI safety framework ever written—they produce promises, not proof.”

This isn’t unique to AI. It’s the same reason financial auditing exists. The same reason pharmaceutical trials require independent oversight. The same reason building inspectors don’t work for the construction company. Any system where the regulated entity is also the regulator will eventually optimize for the entity’s commercial interests. It’s not corruption. It’s physics.

That is why regulated adopters should treat voluntary frameworks as inputs into diligence, not as the end of diligence.

What Replaces Voluntary Safety: The Evidence Standard

If voluntary commitments are architecturally incapable of surviving commercial pressure, the replacement must be architecturally different. Not “stronger promises” or “better-intentioned companies”—but a fundamentally different mechanism for establishing that AI systems operate within defined safety boundaries.

That mechanism is verifiable evidence.

The distinction matters:

Voluntary Promises

  • Self-reported compliance status
  • Policies describing intended behavior
  • Reputation as the enforcement mechanism
  • Revocable at the company’s discretion

Verifiable Evidence

  • Cryptographic proof that controls executed
  • Third-party witnessed, tamper-evident records
  • Independent auditability without vendor access
  • Continuous, per-inference, non-repudiable

Verifiable evidence means that when an organization claims its AI system ran a safety check, there exists a signed, timestamped, independently verifiable record proving it. Not a policy document. Not a vendor attestation letter. Not a checkbox on a questionnaire. An actual cryptographic receipt that a third party—a regulator, an auditor, a court—can validate without trusting the organization that produced it.

This is the standard that replaces voluntary safety. Not because it’s stricter. Because it’s architecturally sound. It removes the fatal dependency on the good intentions of the entity being governed.

Safe Harbor Is Earned Through Proof, Not Pledges

Regulators have already begun encoding this shift into law. Colorado’s AI Act—the most significant AI regulation in the United States—creates an explicit safe harbor for organizations that can demonstrate adherence to the NIST AI Risk Management Framework. Not claim adherence. Not document adherence. Demonstrate it, with evidence.

The EU AI Act’s Article 12 requires automatic recording of events for high-risk AI systems to support traceability and post-market monitoring. ISO 42001 is a management-system standard, which means it focuses on evidence that governance processes are operating rather than on marketing claims. NIST AI RMF likewise centers mapping, measuring, managing, and governing risk over time.

The regulatory direction is clear enough: self-attested promises are weaker than records that can be reviewed later.

The timing matters: Colorado’s AI Act takes effect June 30, 2026. Under the EU AI Act, many Annex III high-risk obligations begin August 2, 2026, while many AI systems tied to MDR/IVDR-regulated products follow on August 2, 2027. The closer these dates get, the less persuasive unsupported safety promises become.

The Verifiable Era Has Already Begun

Revisions to company-authored safety frameworks do not mean AI safety work is pointless. They do show why governance that depends entirely on vendor promises is unstable.

What replaces it is already taking shape. It isn’t built on trust. It’s built on evidence. Cryptographic evidence that safety controls ran. Third-party witnesses that make tampering detectable. Per-inference records that connect specific AI outputs to specific control executions. Framework mappings that translate operational evidence into regulatory compliance across ISO 42001, NIST AI RMF, EU AI Act, and Colorado’s safe harbor provisions.

The organizations deploying AI in healthcare, financial services, and other high-stakes environments don’t need their model providers to be virtuous. They need their own governance infrastructure to be verifiable. They need evidence that their guardrails executed—evidence that survives regardless of what their upstream providers decide to do with their voluntary commitments next quarter.

The voluntary era produced useful ideas and important vocabulary. But it also produced a governance style that often stops at intention rather than independently reviewable execution.

The stronger posture is to pair voluntary commitments with records, reviews, and operational evidence that survive policy revisions and marketing cycles.

Primary Sources

Pango waving

Find Out Where You Stand

Take a free 5-minute AI governance assessment. See how your organization’s AI controls measure against ISO 42001, NIST AI RMF, the EU AI Act, and Colorado’s safe harbor requirements—and where verifiable evidence could close your gaps.

Start Your Free Assessment