Current approaches to AI safety share a fatal assumption: that detection and enforcement are sufficient. Guardrails detect threats. GRC platforms document policies. Observability tools record events. None of them prove that safety controls actually executed on a given inference. This paper introduces Attestable Threat Intelligence (ATI) — a cryptographic primitive that transforms AI safety from claims requiring trust into mathematical proof requiring only verification.
ATI addresses three hard constraints that current approaches cannot solve simultaneously:
We describe an architecture where evidence never leaves the customer boundary, yet third parties can mathematically verify that safety controls executed correctly. This is not merely observability with better encryption. It is a fundamentally different primitive: proof of execution without evidence egress.
The AI safety field has converged on a three-layer architecture:
| Layer | Function | Examples |
|---|---|---|
| Policy | Define what should happen | Credo AI, ServiceNow, Archer |
| Safety | Prevent bad outcomes | NeMo Guardrails, CalypsoAI, Robust Intelligence |
| Observability | Record what happened | Datadog, Arthur AI, Arize |
This architecture reflects a positivist assumption borrowed from software testing: if we specify rules precisely enough and monitor execution thoroughly enough, safety reduces to rule-following.
The assumption fails for three reasons.
First, multi-turn attacks defeat single-turn classifiers. An October 2025 study from OpenAI, Anthropic, and Google DeepMind examined 12 published defenses against prompt injection and jailbreaking. Using adaptive attacks that iterate multiple times, they achieved attack success rates above 90% for most defenses — despite those defenses originally reporting near-zero attack success rates. Claude 3.5 Sonnet: 78% attack success rate with sufficient attempts. GPT-4o: 89%. Single-turn safety evaluation systematically underestimates real-world vulnerability.
Second, fine-tuning erases safety at negligible cost. Researchers demonstrated that GPT-3.5 Turbo could be jailbroken with only 10 adversarial examples for under $0.20. Fine-tuned variants show 22x higher odds of producing harmful responses. No standard benchmark measures post-fine-tuning safety degradation.
Third, and most fundamentally: detection is not proof. An observability dashboard shows that guardrails claim to have executed. A GRC platform shows that policies exist. Neither provides cryptographic evidence that a specific guardrail executed on a specific inference at a specific time under a specific network configuration. When litigation arrives — and it will, given Character.AI's wrongful death lawsuit, Sharp HealthCare's class action over AI-fabricated consent records, and the coming wave of AI malpractice claims — "our dashboard says it worked" is not a defense. Evidence is a defense.
The distinction between documentation and evidence has legal weight. Under the Federal Rules of Evidence:
Conventional AI logging fails all three requirements. Logs can be modified. Timestamps can be forged. Selection can be biased. There is no cryptographic binding between a logged event and the actual inference that occurred.
This creates what we term the proof gap: the space between what organizations claim about AI behavior and what they can demonstrate with independently verifiable evidence.
The proof gap has immediate commercial consequences:
Attestable Threat Intelligence (ATI) is a cryptographic primitive with four defining properties:
This is not encryption applied to existing observability. It is a different architecture where the receipt service is structurally incapable of receiving evidence content.
┌─────────────────────────────────────────────────────────────────────┐
│ CUSTOMER BOUNDARY │
│ │
│ ┌─────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Client │───▶│ Sidecar │───▶│ AI Service │ │
│ └─────────┘ └──────┬──────┘ └─────────────┘ │
│ │ │
│ ┌──────────┴──────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ L0 Attestation │ │ Local Evidence │ │
│ │ (ALL requests) │ │ Storage (sampled) │ │
│ └────────┬────────┘ └─────────────────────┘ │
│ │ │
│ │ Hash + Signature only │
└────────────┼────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────┐
│ WITNESS NETWORK │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Receipt Service │ │ Witness Service │ │ Transparency │ │
│ │ (Closed Schema) │ │ (Epoch Tokens) │ │ Log (Merkle) │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────┘
The architecture has three trust boundaries:
additionalProperties: false) that structurally rejects any field not in the normative specification — it cannot accept evidence content even if compromised.Every request generates an L0 attestation. Only sampled requests generate L2 attestations.
L0 Envelope (ALL requests):
| Field | Description |
|---|---|
blinded_id |
First 8 bytes of PRF tag — enables linkage without revealing full PRF |
request_commitment |
HMAC-SHA256(policy_key, "reqdig:v1" || request_digest) |
encoder_id |
SHA-256 hash of encoder binary (determinism proof) |
metadata_policy_results |
Endpoint, auth scope, region, model ID, deployment |
mono_counter |
Strictly monotonic per sidecar (replay prevention) |
wall_time_ns |
Nanoseconds since Unix epoch |
kid |
Key identifier |
sid |
Sidecar identifier |
signature |
Ed25519 signature over fields 0-7 |
L2 Envelope (SAMPLED requests):
| Field | Description |
|---|---|
l0_reference |
Same as L0.blinded_id |
request_commitment |
Must match L0 |
evidence_commitment |
HMAC(storage_key, "evid:v1" || SHA-256(evidence)) |
prf_tag |
Full 256-bit HMAC output (enables auditor verification) |
policy_scores |
Toxicity, PII detection, bias scores, etc. |
signature |
Ed25519 signature |
The critical innovation: L2 envelopes include the complete PRF tag, allowing any auditor with the policy key to verify that the sampling decision was correct without accessing the underlying evidence.
The sampling function is deterministic and auditor-reproducible:
policy_key = HKDF-SHA256(root_secret, salt="glacis-v1-policy", info=policy_id)
prf_tag = HMAC-SHA256(policy_key, "prf:v1" || policy_id || request_digest)
threshold = floor(sampling_probability × (2^64 - 1))
sample iff BE_uint64(prf_tag[0:8]) ≤ threshold
For 5% sampling: threshold = 0x0ccccccccccccccc
This is not OpenTelemetry's consistent probability sampling. OpenTelemetry samples based on trace IDs using a shared random seed — any node can independently make the same sampling decision for coordination-free distributed tracing. ATI's PRF uses secret keys per policy with the complete 256-bit PRF value embedded in L2 attestations, enabling third-party reproduction of every sampling decision.
Domain separation prevents cross-purpose collisions:
| Prefix | Purpose |
|---|---|
prf:v1 | Pseudorandom function |
reqdig:v1 | Request digest commitment |
evid:v1 | Evidence commitment |
bearer:v1 | Heartbeat bearer token |
s3p:sampler:v1 | S3P random sampling |
Attestations are bound to the runtime state at the moment of execution through a multi-layer network isolation proof (NETATT):
| Layer | Component | Hash Computation |
|---|---|---|
| 1 | Policy | SHA-256(canonical JSON egress policy, RFC 8785) |
| 2 | iptables | SHA-256(normalized dump: tables ordered raw/mangle/nat/filter) |
| 3 | eBPF | SHA-256(programs sorted by attach point, ELF section order) |
| 4 | CNI | SHA-256(RFC 8785 canonical JSON of CNI config) |
| 5 | SPKI Pins | Array of SHA-256(DER_SPKI), lexicographically sorted |
Final: network_state_hash = SHA-256(CBOR_encode(NETATT))
The receipt service requires co-epoch NETATT for every receipt. A sidecar cannot claim attestations were made during an epoch unless it registered its binary hash and network state during that epoch. This prevents:
The receipt service API enforces additionalProperties: false at every level:
{
"type": "object",
"properties": {
"attestation_hash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
"heartbeat_epoch": { "type": "integer", "minimum": 0 }
},
"required": ["attestation_hash", "heartbeat_epoch"],
"additionalProperties": false
}
Any request containing fields not in this schema receives ERR_SCHEMA_UNKNOWN_FIELD. This is not a policy decision — it is a structural guarantee. The receipt service cannot receive evidence content because there is no field in the schema to put it.
At scale (109 requests/day), deterministic 5-15% sampling remains expensive. S3P reduces inspection to 0.1-1% while preserving actuarial rigor through exact binomial confidence bounds and cryptographically auditable random selection.
At epoch start, the witness generates a 256-bit epoch_nonce via CSPRNG, kept secret until epoch close.
s_tag = HMAC-SHA256(epoch_nonce, "s3p:sampler:v1" || request_digest)
threshold = floor(p × (2^256 - 1))
sample iff BE_uint256(s_tag) ≤ threshold
After epoch close:
{epoch, epoch_nonce, signature}epoch_nonce_commitment = SHA-256(epoch_nonce)s_tag for every request in the Digest Publication LedgerS3P uses Clopper-Pearson intervals rather than normal approximations:
CI_lower = 0 if k=0 else Beta^{-1}(α/2; k, n-k+1)
CI_upper = 1 if k=n else Beta^{-1}(1-α/2; k+1, n-k)
Where k = violations observed, n = samples taken, α = significance level.
This provides exact coverage probability regardless of the underlying violation rate, eliminating the normal approximation assumptions that fail at low sample sizes or extreme rates.
From n samples with k violations at 95% confidence:
If upper_bound ≤ τ (e.g., 1%):
Issue ConformanceCertificate {
epoch,
sample_count: n,
violation_count: k,
upper_bound,
confidence: 0.95,
signature
}
This certificate is a statistically sound claim: "With 95% confidence, the violation rate does not exceed τ." Insurance carriers can price risk against upper_bound, not self-reported violation rates.
Attestation alone is passive observation. The breakthrough that transforms monitoring into guarantee is the cryptographic control loop — formally verified feedback where parameter changes are gated by cryptographic proof of co-epoch state.
Traditional safety monitoring:
Monitor → Attest → Report → (Human intervenes)
Cryptographic control loop:
Monitor → Attest → Cryptographically-verified automatic adjustment → Self-stabilization
This is not merely automation. It is provable stability — the system mathematically cannot destabilize within defined bounds.
Every ControlAction must pass all five gates:
ControlAction.epoch == current_epochp_min ≤ sampling_prob ≤ p_maxALL GATES PASSED: Apply params_after + log action
ANY GATE FAILED: Reject + maintain params_before
The controller uses a Lyapunov function approach:
V(state) = sampling_rate² + backpressure²/MAX_BP²
Prove: dV/dt ≤ 0 for all valid inputs
Result: System converges to stable point
This proof is implemented in TLA+ and compiled to Rust/WASM. Any third party can verify that the controller satisfies its invariants.
When violation rates exceed thresholds:
| Violation Level | Control Response |
|---|---|
| Low | Increase sampling rate (1% → 5%) |
| Medium | Increase sampling rate (5% → 20%) + apply backpressure |
| High | Switch model to safe mode with restricted capabilities |
| Critical | Open circuit breakers, require human approval for continuation |
Every control action is:
No prior art combines these elements:
Building this requires the intersection of control theory expertise (rare), formal verification expertise (rarer), and cryptographic engineering (rare). All three together is unprecedented.
ATI produces four risk signals that insurers, auditors, and regulators can assess without accessing evidence content:
Signal: Verified monitoring proportion via PRF recomputation
Given the Digest Publication Ledger, policy identifier, policy key, and canonicalization rules, an auditor can recompute the PRF tag for every request and verify that every request that should have been sampled has a corresponding L2 envelope.
Signal: Unmonitored intervals from epoch/NETATT gaps
If a sidecar fails to submit heartbeats for an epoch, or submits heartbeats without corresponding NETATT, the exposure window is precisely bounded: [epoch_start, epoch_end] had no cryptographic coverage.
Signal: Time from violation attestation to applied ControlAction
If a violation is attested at t₁ and the corresponding ControlAction is applied at t₂, the response latency (t₂ - t₁) is cryptographically verifiable. SLA violations are objective facts, not operator claims.
Signal: Evidence commitment sets compared to CAS proofs of possession
The evidence_commitment in L2 envelopes can be verified against content-addressable storage proofs without accessing the evidence itself.
These signals enable parametric triggers — automatic insurance events based on cryptographically verifiable conditions:
Trigger: CI_upper > 1% for 3 consecutive epochs
Action: Automatic premium adjustment or coverage limitation
This is not a claim requiring adjustment. It is a mathematical fact verified from the attestation chain.
| Deadline | Regulation | Requirement |
|---|---|---|
| August 2, 2026 | EU AI Act Article 12 | Automatic logging for high-risk AI with risk detection, post-market monitoring, and operational oversight capabilities |
| June 30, 2026 | Colorado AI Act | Comprehensive governance requirements for high-risk AI |
| January 1, 2026 | California SB 1120 | Healthcare AI disclosure requirements |
| Effective now | Verisk ISO exclusions | Mainstream insurers exiting undocumented AI risk |
The European Commission's Digital Omnibus proposes conditional delays tied to infrastructure availability — but organizations starting today barely have enough time for August 2026, as conformity assessment alone takes 6-12 months.
Munich Re's aiSure program represents the emerging model: attestation-based underwriting where the act of underwriting serves as an independent seal of approval.
From Munich Re's AI specialist: "AI models are inherently probabilistic and will make mistakes... we're comfortable covering a broad range of error rates, from very low to high — they will be reflected in the premium."
Armilla's AI Liability Insurance (Lloyd's, April 2025) provides "explicit, affirmative coverage for AI-related exposures that traditional insurance often fails to address" — but requires verifiable governance controls.
The projected market: $1.2B (2025) → $15-25B (2030) → $60-100B (2035).
MLCommons AILuminate v1.0 represents the first industry-standard benchmark with letter grades (A-F) across 13 hazard categories. But a UK AI Safety Institute review of 440 benchmarks found ~50% aim to measure abstract concepts like "harmlessness" without clearly defining them, while only 16% use statistical methods when comparing results.
More critically: all existing benchmarks evaluate models in isolation. None evaluate the combined stack of model + guardrails + infrastructure in production deployment.
AWS, Azure, and GCP face irreconcilable conflicts when attempting to attest their own AI services:
This is the Arthur Andersen problem: independent third-party attestation is architecturally necessary because hyperscalers cannot credibly audit their own commercial products.
Datadog, Arthur AI, and Arize record events. They cannot prove events were not modified. They cannot prove sampling was unbiased. They cannot produce cryptographic evidence admissible in court.
More fundamentally: observability requires evidence egress. Every prompt and response must be transmitted to the monitoring service. This conflicts with:
Zero-egress is not a feature. It is an architectural requirement.
NeMo Guardrails, Guardrails AI, and CalypsoAI execute safety controls. They cannot prove they executed. They operate as detection-plane components — observing traffic and potentially blocking it. They do not generate cryptographic attestations of what they did.
The October 2025 research showing 90%+ attack success rates against published defenses demonstrates the fundamental weakness: guardrails are classifiers, and classifiers can be fooled.
ATI does not replace guardrails. It proves they executed — and provides the feedback loop to improve them when they fail.
As of January 2026, the GLACIS implementation provides:
| Component | Status |
|---|---|
| Hierarchical key derivation (HKDF-SHA256) | Fully implemented (25 tests) |
| Evidence commitments with domain separation | Fully implemented (16 tests) |
| Federated witness services (3 cloud providers) | Fully implemented (103 tests) |
| Auditor PRF recomputation tools | Fully implemented (CLI + Rust verifier) |
| Cryptographic control loop | Fully implemented (28 tests, TLA+ specification) |
| S3P random sampling | Specified (RFC test vectors implemented) |
| Digest Publication Ledger | Specified (types defined, implementation pending) |
| Insurance parametric API | Foundation implemented (control loop provides signals) |
The open-source sidecar runs on eBPF-capable Linux kernels. The witness network operates across Cloudflare Workers, AWS Lambda, and GCP Cloud Run with 99.9% availability.
Attestable Threat Intelligence is not better logging. It is not encrypted observability. It is not compliance automation.
ATI is a new primitive — cryptographic proof of safety control execution with evidence locality, auditor reproducibility, and zero-knowledge risk signals.
The category emerges from the recognition that detection is not proof, documentation is not evidence, and trust is not verification. As AI systems assume responsibility for consequential decisions — medical diagnoses, credit determinations, hiring recommendations — the question shifts from "Did it work?" to "Can you prove it worked?"
The answer requires:
This is the infrastructure that makes AI governance verifiable rather than claimed.
Attestable Threat Intelligence (ATI): Cryptographic primitive providing proof of safety control execution without evidence egress.
Co-epoch binding: Cryptographic linkage of attestations to binary hash and network isolation state within a temporal window.
Evidence locality: Architectural guarantee that prompts, responses, and policy evaluations never leave the customer boundary.
L0 attestation: Metadata-only attestation generated for every request.
L2 attestation: Full attestation with evidence commitment generated for sampled requests.
NETATT: Network-isolation attestation capturing iptables, eBPF, CNI, and SPKI state.
Policy-scoped PRF: HMAC-based pseudorandom function keyed per policy enabling deterministic, auditor-reproducible sampling.
S3P (Statistical Safety Signal Protocol): Ultra-low-rate (0.1-1%) sampling with exact binomial confidence bounds and post-epoch nonce reveal.
Zero-knowledge risk signal: Verifiable risk metric derived from cryptographic artifacts without evidence content access.
This paper describes technology covered by pending patent applications. GLACIS, the GLACIS logo, and Attestable Threat Intelligence are trademarks of GLACIS, Inc.
Contact: [email protected]