Attestable Threat Intelligence

A New Primitive for AI Governance

GLACIS Technical Whitepaper

January 2026

Abstract

Current approaches to AI safety share a fatal assumption: that detection and enforcement are sufficient. Guardrails detect threats. GRC platforms document policies. Observability tools record events. None of them prove that safety controls actually executed on a given inference. This paper introduces Attestable Threat Intelligence (ATI) — a cryptographic primitive that transforms AI safety from claims requiring trust into mathematical proof requiring only verification.

ATI addresses three hard constraints that current approaches cannot solve simultaneously:

  1. Selection bias — Auditors cannot trust provider-selected samples to be representative
  2. Data residency — Organizations cannot export prompts and responses to third-party monitors without triggering privacy violations
  3. Evidentiary value — Conventional logs are easy to forge and carry no cryptographic guarantees

We describe an architecture where evidence never leaves the customer boundary, yet third parties can mathematically verify that safety controls executed correctly. This is not merely observability with better encryption. It is a fundamentally different primitive: proof of execution without evidence egress.

1. The Problem: Detection is Not Proof

1.1 The Current Paradigm

The AI safety field has converged on a three-layer architecture:

Layer Function Examples
Policy Define what should happen Credo AI, ServiceNow, Archer
Safety Prevent bad outcomes NeMo Guardrails, CalypsoAI, Robust Intelligence
Observability Record what happened Datadog, Arthur AI, Arize

This architecture reflects a positivist assumption borrowed from software testing: if we specify rules precisely enough and monitor execution thoroughly enough, safety reduces to rule-following.

The assumption fails for three reasons.

First, multi-turn attacks defeat single-turn classifiers. An October 2025 study from OpenAI, Anthropic, and Google DeepMind examined 12 published defenses against prompt injection and jailbreaking. Using adaptive attacks that iterate multiple times, they achieved attack success rates above 90% for most defenses — despite those defenses originally reporting near-zero attack success rates. Claude 3.5 Sonnet: 78% attack success rate with sufficient attempts. GPT-4o: 89%. Single-turn safety evaluation systematically underestimates real-world vulnerability.

Second, fine-tuning erases safety at negligible cost. Researchers demonstrated that GPT-3.5 Turbo could be jailbroken with only 10 adversarial examples for under $0.20. Fine-tuned variants show 22x higher odds of producing harmful responses. No standard benchmark measures post-fine-tuning safety degradation.

Third, and most fundamentally: detection is not proof. An observability dashboard shows that guardrails claim to have executed. A GRC platform shows that policies exist. Neither provides cryptographic evidence that a specific guardrail executed on a specific inference at a specific time under a specific network configuration. When litigation arrives — and it will, given Character.AI's wrongful death lawsuit, Sharp HealthCare's class action over AI-fabricated consent records, and the coming wave of AI malpractice claims — "our dashboard says it worked" is not a defense. Evidence is a defense.

1.2 The Proof Gap

The distinction between documentation and evidence has legal weight. Under the Federal Rules of Evidence:

Conventional AI logging fails all three requirements. Logs can be modified. Timestamps can be forged. Selection can be biased. There is no cryptographic binding between a logged event and the actual inference that occurred.

This creates what we term the proof gap: the space between what organizations claim about AI behavior and what they can demonstrate with independently verifiable evidence.

The proof gap has immediate commercial consequences:

2. Attestable Threat Intelligence: The Primitive

Attestable Threat Intelligence (ATI) is a cryptographic primitive with four defining properties:

  1. Evidence locality by construction — Prompts, responses, and policy evaluations never leave the customer boundary. Only cryptographic commitments (hashes and signatures) are transmitted.
  2. Auditor-reproducible sampling — Third parties possessing the policy key can recompute every sampling decision and verify that every request that should have been sampled has a corresponding attestation.
  3. Co-epoch binding — Every attestation is cryptographically bound to the binary hash of the executing code and the network isolation state at the moment of execution.
  4. Zero-knowledge risk signals — Insurers, auditors, and regulators can assess compliance using only cryptographic artifacts, without ever seeing the underlying data.

This is not encryption applied to existing observability. It is a different architecture where the receipt service is structurally incapable of receiving evidence content.

2.1 Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                        CUSTOMER BOUNDARY                            │
│                                                                     │
│   ┌─────────┐    ┌─────────────┐    ┌─────────────┐                │
│   │  Client │───▶│   Sidecar   │───▶│  AI Service │                │
│   └─────────┘    └──────┬──────┘    └─────────────┘                │
│                         │                                           │
│              ┌──────────┴──────────┐                               │
│              │                     │                               │
│              ▼                     ▼                               │
│   ┌─────────────────┐   ┌─────────────────────┐                   │
│   │  L0 Attestation │   │  Local Evidence     │                   │
│   │  (ALL requests) │   │  Storage (sampled)  │                   │
│   └────────┬────────┘   └─────────────────────┘                   │
│            │                                                        │
│            │ Hash + Signature only                                  │
└────────────┼────────────────────────────────────────────────────────┘
             │
             ▼
┌────────────────────────────────────────────────────────────────────┐
│                      WITNESS NETWORK                                │
│                                                                     │
│   ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐  │
│   │ Receipt Service │   │ Witness Service │   │ Transparency    │  │
│   │ (Closed Schema) │   │ (Epoch Tokens)  │   │ Log (Merkle)    │  │
│   └─────────────────┘   └─────────────────┘   └─────────────────┘  │
│                                                                     │
└────────────────────────────────────────────────────────────────────┘

The architecture has three trust boundaries:

  1. Customer Boundary — Where evidence exists. Prompts, responses, and full policy evaluation results are stored here in content-addressable storage. Nothing crosses this boundary except cryptographic commitments.
  2. Sidecar — Open-source component operating as a transparent proxy via eBPF, iptables, or service mesh integration. Performs governed canonicalization (deterministic CBOR encoding), PRF computation for sampling decisions, and signature generation. Egress is restricted to a single pinned TLS destination.
  3. Witness Network — Distributed across multiple cloud providers (Cloudflare Workers, AWS Lambda, GCP Cloud Run). Issues per-epoch bearer tokens, maintains binary allowlists, and operates the transparency log. The receipt service uses a closed JSON schema (additionalProperties: false) that structurally rejects any field not in the normative specification — it cannot accept evidence content even if compromised.

2.2 Two-Tier Attestation (L0/L2)

Every request generates an L0 attestation. Only sampled requests generate L2 attestations.

L0 Envelope (ALL requests):

Field Description
blinded_id First 8 bytes of PRF tag — enables linkage without revealing full PRF
request_commitment HMAC-SHA256(policy_key, "reqdig:v1" || request_digest)
encoder_id SHA-256 hash of encoder binary (determinism proof)
metadata_policy_results Endpoint, auth scope, region, model ID, deployment
mono_counter Strictly monotonic per sidecar (replay prevention)
wall_time_ns Nanoseconds since Unix epoch
kid Key identifier
sid Sidecar identifier
signature Ed25519 signature over fields 0-7

L2 Envelope (SAMPLED requests):

Field Description
l0_reference Same as L0.blinded_id
request_commitment Must match L0
evidence_commitment HMAC(storage_key, "evid:v1" || SHA-256(evidence))
prf_tag Full 256-bit HMAC output (enables auditor verification)
policy_scores Toxicity, PII detection, bias scores, etc.
signature Ed25519 signature

The critical innovation: L2 envelopes include the complete PRF tag, allowing any auditor with the policy key to verify that the sampling decision was correct without accessing the underlying evidence.

2.3 Policy-Scoped Pseudorandom Function

The sampling function is deterministic and auditor-reproducible:

policy_key = HKDF-SHA256(root_secret, salt="glacis-v1-policy", info=policy_id)

prf_tag = HMAC-SHA256(policy_key, "prf:v1" || policy_id || request_digest)

threshold = floor(sampling_probability × (2^64 - 1))

sample iff BE_uint64(prf_tag[0:8]) ≤ threshold

For 5% sampling: threshold = 0x0ccccccccccccccc

This is not OpenTelemetry's consistent probability sampling. OpenTelemetry samples based on trace IDs using a shared random seed — any node can independently make the same sampling decision for coordination-free distributed tracing. ATI's PRF uses secret keys per policy with the complete 256-bit PRF value embedded in L2 attestations, enabling third-party reproduction of every sampling decision.

Domain separation prevents cross-purpose collisions:

Prefix Purpose
prf:v1Pseudorandom function
reqdig:v1Request digest commitment
evid:v1Evidence commitment
bearer:v1Heartbeat bearer token
s3p:sampler:v1S3P random sampling

2.4 Co-Epoch Binding

Attestations are bound to the runtime state at the moment of execution through a multi-layer network isolation proof (NETATT):

Layer Component Hash Computation
1 Policy SHA-256(canonical JSON egress policy, RFC 8785)
2 iptables SHA-256(normalized dump: tables ordered raw/mangle/nat/filter)
3 eBPF SHA-256(programs sorted by attach point, ELF section order)
4 CNI SHA-256(RFC 8785 canonical JSON of CNI config)
5 SPKI Pins Array of SHA-256(DER_SPKI), lexicographically sorted

Final: network_state_hash = SHA-256(CBOR_encode(NETATT))

The receipt service requires co-epoch NETATT for every receipt. A sidecar cannot claim attestations were made during an epoch unless it registered its binary hash and network state during that epoch. This prevents:

2.5 The Closed Schema Guarantee

The receipt service API enforces additionalProperties: false at every level:

{
  "type": "object",
  "properties": {
    "attestation_hash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
    "heartbeat_epoch": { "type": "integer", "minimum": 0 }
  },
  "required": ["attestation_hash", "heartbeat_epoch"],
  "additionalProperties": false
}

Any request containing fields not in this schema receives ERR_SCHEMA_UNKNOWN_FIELD. This is not a policy decision — it is a structural guarantee. The receipt service cannot receive evidence content because there is no field in the schema to put it.

3. Statistical Safety Signal Protocol (S3P)

At scale (109 requests/day), deterministic 5-15% sampling remains expensive. S3P reduces inspection to 0.1-1% while preserving actuarial rigor through exact binomial confidence bounds and cryptographically auditable random selection.

3.1 Cryptographic Random Sampling

At epoch start, the witness generates a 256-bit epoch_nonce via CSPRNG, kept secret until epoch close.

s_tag = HMAC-SHA256(epoch_nonce, "s3p:sampler:v1" || request_digest)

threshold = floor(p × (2^256 - 1))

sample iff BE_uint256(s_tag) ≤ threshold

After epoch close:

  1. Witness publishes {epoch, epoch_nonce, signature}
  2. Auditors verify epoch_nonce_commitment = SHA-256(epoch_nonce)
  3. Auditors recompute s_tag for every request in the Digest Publication Ledger
  4. Auditors verify sample membership and recompute confidence bounds

3.2 Exact Binomial Confidence Intervals

S3P uses Clopper-Pearson intervals rather than normal approximations:

CI_lower = 0 if k=0 else Beta^{-1}(α/2; k, n-k+1)
CI_upper = 1 if k=n else Beta^{-1}(1-α/2; k+1, n-k)

Where k = violations observed, n = samples taken, α = significance level.

This provides exact coverage probability regardless of the underlying violation rate, eliminating the normal approximation assumptions that fail at low sample sizes or extreme rates.

3.3 Conformance Certificates

From n samples with k violations at 95% confidence:

If upper_bound ≤ τ (e.g., 1%):
  Issue ConformanceCertificate {
    epoch,
    sample_count: n,
    violation_count: k,
    upper_bound,
    confidence: 0.95,
    signature
  }

This certificate is a statistically sound claim: "With 95% confidence, the violation rate does not exceed τ." Insurance carriers can price risk against upper_bound, not self-reported violation rates.

4. The Cryptographic Control Loop

Attestation alone is passive observation. The breakthrough that transforms monitoring into guarantee is the cryptographic control loop — formally verified feedback where parameter changes are gated by cryptographic proof of co-epoch state.

4.1 The Innovation

Traditional safety monitoring:

Monitor → Attest → Report → (Human intervenes)

Cryptographic control loop:

Monitor → Attest → Cryptographically-verified automatic adjustment → Self-stabilization

This is not merely automation. It is provable stability — the system mathematically cannot destabilize within defined bounds.

4.2 Five-Gate Actuator Validation

Every ControlAction must pass all five gates:

  1. Signature Verification — Ed25519 signature valid
  2. Epoch CurrencyControlAction.epoch == current_epoch
  3. Parameter Boundsp_min ≤ sampling_prob ≤ p_max
  4. Co-Epoch Receipt — Receipt exists for metrics bundle in same epoch
  5. Co-Epoch Network Attestation — NETATT exists for same epoch
ALL GATES PASSED: Apply params_after + log action
ANY GATE FAILED:  Reject + maintain params_before

4.3 Formal Stability Proof

The controller uses a Lyapunov function approach:

V(state) = sampling_rate² + backpressure²/MAX_BP²

Prove: dV/dt ≤ 0 for all valid inputs
Result: System converges to stable point

This proof is implemented in TLA+ and compiled to Rust/WASM. Any third party can verify that the controller satisfies its invariants.

4.4 Control Actions

When violation rates exceed thresholds:

Violation Level Control Response
Low Increase sampling rate (1% → 5%)
Medium Increase sampling rate (5% → 20%) + apply backpressure
High Switch model to safe mode with restricted capabilities
Critical Open circuit breakers, require human approval for continuation

Every control action is:

4.5 Why This Is Novel

No prior art combines these elements:

Building this requires the intersection of control theory expertise (rare), formal verification expertise (rarer), and cryptographic engineering (rare). All three together is unprecedented.

5. Zero-Knowledge Risk Signals

ATI produces four risk signals that insurers, auditors, and regulators can assess without accessing evidence content:

5.1 Sampling Coverage

Signal: Verified monitoring proportion via PRF recomputation

Given the Digest Publication Ledger, policy identifier, policy key, and canonicalization rules, an auditor can recompute the PRF tag for every request and verify that every request that should have been sampled has a corresponding L2 envelope.

5.2 Exposure Windows

Signal: Unmonitored intervals from epoch/NETATT gaps

If a sidecar fails to submit heartbeats for an epoch, or submits heartbeats without corresponding NETATT, the exposure window is precisely bounded: [epoch_start, epoch_end] had no cryptographic coverage.

5.3 Response Latency

Signal: Time from violation attestation to applied ControlAction

If a violation is attested at t₁ and the corresponding ControlAction is applied at t₂, the response latency (t₂ - t₁) is cryptographically verifiable. SLA violations are objective facts, not operator claims.

5.4 Retention Integrity

Signal: Evidence commitment sets compared to CAS proofs of possession

The evidence_commitment in L2 envelopes can be verified against content-addressable storage proofs without accessing the evidence itself.

5.5 Parametric Insurance Triggers

These signals enable parametric triggers — automatic insurance events based on cryptographically verifiable conditions:

Trigger: CI_upper > 1% for 3 consecutive epochs
Action: Automatic premium adjustment or coverage limitation

This is not a claim requiring adjustment. It is a mathematical fact verified from the attestation chain.

6. The Market Context: January 2026

6.1 Regulatory Deadlines

Deadline Regulation Requirement
August 2, 2026 EU AI Act Article 12 Automatic logging for high-risk AI with risk detection, post-market monitoring, and operational oversight capabilities
June 30, 2026 Colorado AI Act Comprehensive governance requirements for high-risk AI
January 1, 2026 California SB 1120 Healthcare AI disclosure requirements
Effective now Verisk ISO exclusions Mainstream insurers exiting undocumented AI risk

The European Commission's Digital Omnibus proposes conditional delays tied to infrastructure availability — but organizations starting today barely have enough time for August 2026, as conformity assessment alone takes 6-12 months.

6.2 Insurance Market Formation

Munich Re's aiSure program represents the emerging model: attestation-based underwriting where the act of underwriting serves as an independent seal of approval.

From Munich Re's AI specialist: "AI models are inherently probabilistic and will make mistakes... we're comfortable covering a broad range of error rates, from very low to high — they will be reflected in the premium."

Armilla's AI Liability Insurance (Lloyd's, April 2025) provides "explicit, affirmative coverage for AI-related exposures that traditional insurance often fails to address" — but requires verifiable governance controls.

The projected market: $1.2B (2025) → $15-25B (2030) → $60-100B (2035).

6.3 Benchmark Fragmentation

MLCommons AILuminate v1.0 represents the first industry-standard benchmark with letter grades (A-F) across 13 hazard categories. But a UK AI Safety Institute review of 440 benchmarks found ~50% aim to measure abstract concepts like "harmlessness" without clearly defining them, while only 16% use statistical methods when comparing results.

More critically: all existing benchmarks evaluate models in isolation. None evaluate the combined stack of model + guardrails + infrastructure in production deployment.

7. Comparison with Existing Approaches

7.1 Why Hyperscalers Cannot Solve This

AWS, Azure, and GCP face irreconcilable conflicts when attempting to attest their own AI services:

  1. Commercial conflict — They sell AI inference; they cannot credibly audit it
  2. Single-cloud lock-in — Enterprises use multiple providers; no hyperscaler can be the neutral aggregator
  3. Evidence custody model — Their business model requires data transmission; zero-egress conflicts fundamentally

This is the Arthur Andersen problem: independent third-party attestation is architecturally necessary because hyperscalers cannot credibly audit their own commercial products.

7.2 Why Observability Cannot Solve This

Datadog, Arthur AI, and Arize record events. They cannot prove events were not modified. They cannot prove sampling was unbiased. They cannot produce cryptographic evidence admissible in court.

More fundamentally: observability requires evidence egress. Every prompt and response must be transmitted to the monitoring service. This conflicts with:

Zero-egress is not a feature. It is an architectural requirement.

7.3 Why Guardrails Cannot Solve This

NeMo Guardrails, Guardrails AI, and CalypsoAI execute safety controls. They cannot prove they executed. They operate as detection-plane components — observing traffic and potentially blocking it. They do not generate cryptographic attestations of what they did.

The October 2025 research showing 90%+ attack success rates against published defenses demonstrates the fundamental weakness: guardrails are classifiers, and classifiers can be fooled.

ATI does not replace guardrails. It proves they executed — and provides the feedback loop to improve them when they fail.

8. Implementation Status

As of January 2026, the GLACIS implementation provides:

Component Status
Hierarchical key derivation (HKDF-SHA256) Fully implemented (25 tests)
Evidence commitments with domain separation Fully implemented (16 tests)
Federated witness services (3 cloud providers) Fully implemented (103 tests)
Auditor PRF recomputation tools Fully implemented (CLI + Rust verifier)
Cryptographic control loop Fully implemented (28 tests, TLA+ specification)
S3P random sampling Specified (RFC test vectors implemented)
Digest Publication Ledger Specified (types defined, implementation pending)
Insurance parametric API Foundation implemented (control loop provides signals)

The open-source sidecar runs on eBPF-capable Linux kernels. The witness network operates across Cloudflare Workers, AWS Lambda, and GCP Cloud Run with 99.9% availability.

9. Conclusion: The Category Definition

Attestable Threat Intelligence is not better logging. It is not encrypted observability. It is not compliance automation.

ATI is a new primitive — cryptographic proof of safety control execution with evidence locality, auditor reproducibility, and zero-knowledge risk signals.

The category emerges from the recognition that detection is not proof, documentation is not evidence, and trust is not verification. As AI systems assume responsibility for consequential decisions — medical diagnoses, credit determinations, hiring recommendations — the question shifts from "Did it work?" to "Can you prove it worked?"

The answer requires:

  1. Evidence that cannot leave — Zero-egress architecture where prompts and responses never cross the customer boundary
  2. Sampling that can be reproduced — Deterministic PRF with embedded verification enabling third-party audit
  3. Binding that cannot be forged — Co-epoch attestation linking receipts to binary and network state
  4. Controls that cannot destabilize — Formally verified feedback loops with cryptographic gating

This is the infrastructure that makes AI governance verifiable rather than claimed.

Appendix A: Glossary

Attestable Threat Intelligence (ATI): Cryptographic primitive providing proof of safety control execution without evidence egress.

Co-epoch binding: Cryptographic linkage of attestations to binary hash and network isolation state within a temporal window.

Evidence locality: Architectural guarantee that prompts, responses, and policy evaluations never leave the customer boundary.

L0 attestation: Metadata-only attestation generated for every request.

L2 attestation: Full attestation with evidence commitment generated for sampled requests.

NETATT: Network-isolation attestation capturing iptables, eBPF, CNI, and SPKI state.

Policy-scoped PRF: HMAC-based pseudorandom function keyed per policy enabling deterministic, auditor-reproducible sampling.

S3P (Statistical Safety Signal Protocol): Ultra-low-rate (0.1-1%) sampling with exact binomial confidence bounds and post-epoch nonce reveal.

Zero-knowledge risk signal: Verifiable risk metric derived from cryptographic artifacts without evidence content access.

Appendix B: References

Standards and Specifications

Regulatory

Research

Market


This paper describes technology covered by pending patent applications. GLACIS, the GLACIS logo, and Attestable Threat Intelligence are trademarks of GLACIS, Inc.

Contact: [email protected]