GLACIS
Platform
Agentic AI Security Harden one high-risk agent workflow with local controls Regulated Clinical AI Signed runtime evidence for clinical AI review Ambient Clinical Scribes Prove PHI controls ran at the model egress boundary Hiring & Recruitment AI Screening decisions with receipts for bias-audit regimes Healthcare AI Vendor Review Require runtime evidence from the vendors you review Runtime Assurance Loop See, control, prove, and improve AI behavior in production
Evidence Packs Regulator, customer, auditor, and internal review artifacts Sample Evidence Pack A signed runtime receipt and assembled pack OVERT Standard Portable receipt format for runtime assurance Verify a Receipt Check a signed receipt yourself, in your browser
EU AI Act High-risk obligations and the Article 12 logging duty Colorado AI Act SB 26-189 transparency duties, compliance from 2027 Texas TRAIGA HB 149, in force since January 2026 New York & NYC LL144 Bias audits for automated hiring decisions
Resources Company Get runtime coverage
GLACIS

Navigate

Home PlatformLocal controls, signed receipts, and operational insight Resources Company

Solutions

Agentic AI SecurityHarden one high-risk agent workflow with local controls Regulated Clinical AISigned runtime evidence for clinical AI review Ambient Clinical ScribesProve PHI controls ran at the model egress boundary Hiring & Recruitment AIScreening decisions with receipts for bias-audit regimes Healthcare AI Vendor ReviewRequire runtime evidence from the vendors you review Runtime Assurance LoopSee, control, prove, and improve AI behavior in production

Evidence

Evidence PacksArtifacts assembled from signed runtime receipts Sample Evidence PackSee runtime proof become an evidence pack OVERT StandardWhy receipt proof can travel Verify a ReceiptCheck a signed receipt yourself, in your browser

Regulations

EU AI ActHigh-risk obligations and the Article 12 logging duty Colorado AI ActSB 26-189 transparency duties, compliance from 2027 Texas TRAIGAHB 149, in force since January 2026 New York & NYC LL144Bias audits for automated hiring decisions
Get runtime coverage

Runtime proof · OVERT

After a Prompt Injection Attack, Prove What Held

A single prompt injection attack can move markets. Prevention is never perfect — so hold tamper-evident proof of which guardrails fired.

Joe Braidwood
Joe BraidwoodCo-founder & CEO
June 2026 · 5 min read

A single prompt injection attack just demonstrated how much is at stake. According to reporting from Bloomberg and CNBC, the US Commerce Department ordered Anthropic on 11–13 June 2026 to suspend access to its most capable models — Fable 5 and Mythos 5 — for all foreign nationals, citing national-security concerns. The reported trigger was a discovered method of jailbreaking Fable 5: a cybersecurity vulnerability. One injection finding, and a frontier model came off the market for a whole class of users.

For security reviewers and AppSec teams defending LLM applications, the lesson is uncomfortable but clarifying. No system that takes instructions in natural language can be guaranteed injection-proof. The defensible move — and what increasingly decides whether a deployment survives scrutiny — is to hold independent, tamper-evident proof of which controls fired and which boundaries held, verifiable after the fact without exposing the protected content behind them.

What a prompt injection attack actually is

A prompt injection attack manipulates a model into ignoring its operating instructions by smuggling adversarial text into its input — directly in a user message, or indirectly through a document, a webpage, a tool result, or an email the model later reads. The model has no reliable way to separate trusted instructions from untrusted data, because to a language model both are just tokens.

The consequences scale with what the model can reach. A chatbot tricked into rude output is an embarrassment. An agent with tool access — able to query a database, call an API, move money, or touch a patient record — tricked into exfiltrating data or taking an unauthorised action is an incident. This is why ai agent security is the sharp edge of the problem: the blast radius is the union of every tool the agent can invoke. A jailbreak that bypasses a model's safety training and an indirect injection that hijacks an agent's task are different mechanisms with the same downstream question — what did the system do next, and what stopped it.

Why prevention alone is not a defensible position

The honest engineering reality is that no input filter catches every injection. Adversarial phrasing evolves; encodings shift; the attack surface includes content you don't author. Teams layer defences — input sanitisation, output checks, allowlists on tool calls, human-in-the-loop on high-risk actions — and each layer reduces risk without eliminating it. Mature agentic ai security treats prevention as one tier, not the whole strategy.

So when an incident lands, prevention is not the question a reviewer, a regulator, or an insurer asks first. They ask: what happened, and can you show us? And here most teams discover their evidence is thin. Application logs are operator-controlled — the same party under scrutiny wrote them, and could in principle have edited them. A screenshot of a dashboard is an assertion, not proof. Self-reported logs record what a system believed it did; they are recollections, not evidence. As the OVERT standard puts it: governance has always been able to say what ought to be done, and has rarely been able to prove what was.

That gap — between we have controls and here is independently checkable proof a specific control executed at the moment it mattered — is the verification gap. The Fable 5 episode is what it looks like when that gap meets a regulator with the power to act on a single finding.

Runtime evidence: proof that the guardrail fired

The alternative to trusting a log is producing a receipt. A runtime control — the component that inspects a prompt, screens a tool call, denies an action, or escalates to a human — can emit, as a by-product of doing its work, a signed record that an outside party can verify. Not a richer log: evidence. Tamper-evident, independently checkable, and silent about everything it need not disclose.

This is the motion GLACIS calls runtime coverage. Controls enforce at the inference, tool-call, and agent boundary. Each enforcement event — permit, deny, override, escalation, response — produces a signed receipt. Crucially, only cryptographic fingerprints, signatures, and verification metadata cross the trust boundary. The prompt, the document, the patient record, the customer data — the protected content stays inside your environment. The result is proof the guardrail held without turning the evidence trail into a new data-egress channel.

Concretely, mature security operations need five things from this kind of runtime evidence, and the open OVERT standard is built to provide them:

  • Trusted execution evidence — which enforcing component, in which configuration, was active when a governed action occurred.
  • Reliable coverage accounting — what was in scope, what was excluded, and how the denominators were derived, so "we screen tool calls" comes with a measured rate rather than an adjective.
  • Tamper-evident telemetry — records not reducible to operator-controlled logs.
  • Independent verification of enforcement events — permits, denials, overrides, and escalations an outside party can check.
  • Post-incident reconstruction without routine content disclosure — replay the event history of an injection attempt without re-exposing the sensitive data involved.

After a prompt injection attack, that last property is the difference between a defensible afternoon and a bad one. The record can show, cryptographically, that the malicious instruction hit the tool-call boundary and was denied — or, if a control failed, exactly which one and when — without handing investigators the very data the attacker was after.

Independence is what makes the proof worth anything

Self-attestation is not independent attestation. A receipt the governed party could have written, and could have altered, proves little under adversarial scrutiny. OVERT's design separates the two roles by structure: whoever attests is distinct from whoever is governed, and the verifier checks signatures rather than taking the operator's word. That structural independence is what makes OVERT an ai security standard rather than another self-report format.

The 1.1.0 release of OVERT, published 11 June 2026, hardens exactly the machinery that makes this work across an organisational boundary. It is an additive, backward-compatible minor release — an implementation conformant to 1.0 stays conformant to 1.1 unmodified — and its new normative Annex G adds the cross-boundary plumbing: a local content-addressed storage model for evidence retrieval and retention integrity, an HTTP transport binding so attestation can travel between an operator and an external verifier, an automated auditor-discovery protocol via a well-known endpoint, and a reference schema for the ControlAction artifact that records each enforcement event. The scanner and a local classifier are defined as supporting components. In plain terms: it makes "prove the guardrail held" a thing an outside auditor can request and verify over the wire, on demand, without your content ever leaving home.

What to do before the next finding

The Fable 5 jailbreak is a reminder that the question is no longer hypothetical and no longer slow. For teams defending LLM applications, the practical posture is to assume some injection attempt will land, and to make sure that when one does, the answer is proof rather than a hastily assembled narrative.

That means enforcing at the boundaries where agents can act, and emitting independent, tamper-evident receipts for every permit, deny, and escalation — so coverage is a measured number, incidents are reconstructable, and the proof survives a hostile read. Documentation describes intent. Receipts prove execution. When a single finding can move a market, the operators who can answer prove it are the ones who keep their deployments. The same posture is what mature ai security solutions are converging on: verifiable enforcement, not asserted compliance.

If you want to see what a verifiable enforcement record looks like, you can verify a receipt yourself, or get runtime coverage for the boundaries your agents act on. The open standard lives at overt.is and /standard.

GLACIS logo GLACIS

Runtime assurance infrastructure for AI systems that act. Local controls, signed receipts, and evidence packs without sensitive-data egress.

Solutions

  • Agentic AI security
  • Regulated clinical AI
  • Ambient clinical scribes
  • Hiring & recruitment AI
  • Healthcare AI vendor review
  • Runtime assurance loop
  • Get runtime coverage

Regulations

  • EU AI Act
  • Colorado AI Act
  • Texas TRAIGA
  • New York & NYC LL144
  • State AI laws
  • Vendor evidence checklist

Security

  • AI runtime security
  • AI penetration testing
  • Agentic AI security
  • OWASP LLM Top 10
  • Prompt injection
  • Agent runtime assessment

Evidence

  • Evidence packs
  • Sample evidence pack
  • OVERT standard
  • Verify a receipt
  • Resources
  • Trust Center

Company

  • About
  • What we believe
  • Blog
  • White papers
  • Careers
  • Contact

Developers

  • Documentation
  • Python SDK
  • PyPI
  • Quickstart
  • OVERT standard
  • Security

© 2026 Glacis Technologies, Inc.

Terms Privacy Cookies Do Not Sell or Share Trust Center · SOC 2 Type II

We use cookies for analytics and marketing. Details