We Couldn’t Ship Our Own AI. So We Built the Infrastructure to Fix That.

Update, June 2026: OVERT 1.1 is now the current version. See overt.is for the latest specification.

Today we launched two things: auto-redteam, an open-source adversarial evaluation tool for AI systems, and OVERT 1.0, an open standard for cryptographic AI runtime trust. This is the story behind why they exist.

The app i couldn’t ship

Last year I built a therapy app called Yara. LLM-powered, multi-turn conversations, designed to support people between sessions with their human therapist. It worked beautifully in demos. Empathetic, thoughtful, appropriately challenging. Five turns, ten turns — great.

Then I ran it for fifty turns. And something happened that I didn’t expect.

Over extended conversations — twenty, thirty, fifty turns — the model became agreeable. A yes-machine. It stopped challenging distorted thinking, stopped pushing back on catastrophising. The exact thing a therapist should never do. It wasn’t hallucinating. It wasn’t going off-topic. It was doing something subtler and more dangerous: it was thinning. Losing its character, its clinical backbone, its willingness to hold a difficult space.

I couldn’t prove it was safe on turn 47. So I couldn’t ship it.

That experience broke something open for me. The gap between “works in a demo” and “safe in production” is enormous. And nobody had good tooling for it. Not because nobody cared, but because the problem is genuinely hard — and the tools that do exist are expensive, proprietary, or both.

Three things every high-stakes AI system needs

I’ve spent the past year talking to teams building AI for healthcare, financial services, research labs, and government. The conversations are remarkably consistent. Everyone is navigating the same fundamental question: How do I get from staging to production without putting someone at risk?

The answer, as far as we can tell, has three parts.

1. Find out what breaks

Not “does it pass a benchmark” but “what happens when someone actively tries to make it fail.” Adversarial evaluation. Red-teaming. The kind of stress testing that reveals whether your system prompt is a security boundary or a polite suggestion.

We ran auto-redteam against a well-crafted biomedical research assistant — good system prompt, seven clear rules, Gemini 3 Flash underneath. 489 probes. 64 bypasses. A ~13% attack success rate. Prompt injection, multilingual attacks, system prompt leakage, authority manipulation. This isn’t a criticism of the model. This is the reality of deploying any LLM in a high-stakes context. System prompts are a necessary starting point, but they aren’t a security boundary.

We open-sourced auto-redteam because we couldn’t find this tool when we needed it. It runs locally — zero sensitive-data egress, your data never leaves your machine. Point it at any endpoint, any agent, any chat interface. Apache 2.0 licence. Five minutes to first results.

2. Watch it continuously

Red-teaming gives you a snapshot. But AI systems are non-deterministic. They drift. Their attack surface changes with every provider update, every prompt tweak, every new tool integration. A point-in-time audit tells you what was true last Thursday, not what’s true right now.

Model updates shift carefully tuned safety behaviour. New attack patterns emerge weekly. Character thinning — the failure mode I discovered with Yara — only shows up in sustained interaction. You need something that watches continuously, not something that checks once.

3. Prove it ran safely

Here’s where it gets interesting. You can find every vulnerability. You can monitor every session. But if your evidence is a CloudWatch log — if logging is on — that you hope nobody modified, you haven’t actually proven anything. Logs are mutable. Self-reported evidence isn’t evidence. When a regulator, an auditor, or a plaintiff’s attorney asks “show me that your controls executed on this interaction, at this time, on this configuration,” you need something better.

We’ve solved this class of problem before. In 2013, TLS certificates had the same structural issue: certificate authorities issued certs, but nobody could verify issuance was legitimate. Rogue CAs went undetected. The solution was Certificate Transparency — an append-only log with cryptographic receipts that any third party could audit. Same structural problem. Same class of solution.

Introducing OVERT 1.0

OVERT — Observable Verification Evidence for Runtime Trust — is an open standard for cryptographic AI safety attestation. We’re publishing it as a free, open standard because this problem is too important for any single vendor to own.

The spec addresses the verification gap that existing AI governance frameworks leave open. Frameworks tell you what controls should exist. They don’t specify how to produce independent, tamper-evident proof that those controls actually executed on a given interaction, under a given configuration, at a given time. OVERT fills that gap.

Tamper-evident attestation. Cryptographic receipts prove which controls executed, when, and on what. Not reducible to operator-controlled logs.
Zero content egress. Receipts contain hashes and metadata, never raw data. Verification without creating a new disclosure channel.
Four assurance levels. AAL-1 (self-attested) through AAL-4 (hardware-rooted). Start simple, scale up as your requirements demand.
Independent verification. Any third party can verify the attestation chain — buyers, auditors, regulators — without trusting the operator.

The full spec, PDF, and machine-readable feed are at overt.is. We want people to read it, break it, improve it, and build competing implementations. If the standard is good enough, it won’t matter who implements it.

The closed loop

These three pieces — finding breaks, watching continuously, proving execution — aren’t independent tools. They’re a closed loop. Auto-redteam discovers bypasses. Those bypasses become training data for a defender model. The defender deploys at runtime. Every decision gets an OVERT attestation receipt. The attack success rate drops. A new baseline gets committed. The loop runs again.

This is what we’re building at GLACIS: the infrastructure layer that makes this loop possible for anyone deploying AI in high-stakes domains. Auto-redteam is the open-source entry point — free forever, OVERT-compatible evidence output. Behind it, we’re building Enforce (continuous evaluation and drift detection) and Notarize (the OVERT reference implementation for cryptographic attestation).

A humble piece of the answer

Nobody has solved AI safety. We don’t claim to. But we believe the path forward involves purpose-built tooling that’s accessible to everyone building in regulated and high-stakes domains — not locked behind enterprise contracts or vendor lock-in.

The teams we talk to aren’t looking for someone to tell them AI is dangerous. They know. They’re looking for practical tools that help them close the gap between what they’ve built and what they can responsibly deploy. Tools that help them answer the question their board, their CISO, their regulator is going to ask: How do you know this is safe?

That’s what auto-redteam and OVERT are for. Not to gatekeep AI deployment, but to make the gap between “works in a demo” and “safe in production” small enough to cross — and to give you the evidence that you crossed it.

Get started

autoredteam.com Install it today. Point it at whatever you’re building.

overt.is The full OVERT spec. Read it. Tell us what’s wrong with it.

glacis.io/assess Free AI runtime security assessment. 2 minutes, no sales pitch.

Primary sources

OVERT Standard (now v1.1) — Full specification, PDF, machine-readable feed
auto-redteam — Open-source adversarial evaluation (Apache 2.0)
RFC 6962 — Certificate Transparency (the precedent for append-only attestation logs)
NIST AI RMF — The governance framework OVERT complements

Navigate

Solutions

Evidence

Regulations

We couldn’t ship our own AI. So we built the infrastructure to fix that.