Attestable Threat Intelligence

A New Primitive for AI Governance

GLACIS Technical Whitepaper

January 2026

GLACIS

Abstract

Current approaches to AI safety share a fatal assumption: that detection and enforcement are sufficient. Guardrails detect threats. GRC platforms document policies. Observability tools record events. None of them prove that safety controls actually executed on a given inference. This paper introduces Attestable Threat Intelligence (ATI) — a cryptographic primitive that transforms AI safety from claims requiring trust into mathematical proof requiring only verification.

ATI addresses three hard constraints that current approaches cannot solve simultaneously:

Selection bias — Auditors cannot trust provider-selected samples to be representative
Data residency — Organizations cannot export prompts and responses to third-party monitors without triggering privacy violations
Evidentiary value — Conventional logs are easy to forge and carry no cryptographic guarantees

We describe an architecture where evidence never leaves the customer boundary, yet third parties can mathematically verify that safety controls executed correctly. This is not merely observability with better encryption. It is a fundamentally different primitive: proof of execution without evidence egress.

1. The Problem: Detection is Not Proof

1.1 The Current Paradigm

The AI safety field has converged on a three-layer architecture:

Layer	Function	Examples
Policy	Define what should happen	Credo AI, ServiceNow, Archer
Safety	Prevent bad outcomes	NeMo Guardrails, CalypsoAI, Robust Intelligence
Observability	Record what happened	Datadog, Arthur AI, Arize

This architecture reflects a positivist assumption borrowed from software testing: if we specify rules precisely enough and monitor execution thoroughly enough, safety reduces to rule-following.

The assumption fails for three reasons.

First, multi-turn attacks defeat single-turn classifiers. An October 2025 study from OpenAI, Anthropic, and Google DeepMind examined 12 published defenses against prompt injection and jailbreaking. Using adaptive attacks that iterate multiple times, they achieved attack success rates above 90% for most defenses — despite those defenses originally reporting near-zero attack success rates. Claude 3.5 Sonnet: 78% attack success rate with sufficient attempts. GPT-4o: 89%. Single-turn safety evaluation systematically underestimates real-world vulnerability.

Second, fine-tuning erases safety at negligible cost. Researchers demonstrated that GPT-3.5 Turbo could be jailbroken with only 10 adversarial examples for under $0.20. Fine-tuned variants show 22x higher odds of producing harmful responses. No standard benchmark measures post-fine-tuning safety degradation.

Third, and most fundamentally: detection is not proof. An observability dashboard shows that guardrails claim to have executed. A GRC platform shows that policies exist. Neither provides cryptographic evidence that a specific guardrail executed on a specific inference at a specific time under a specific network configuration. When litigation arrives — and it will, given Character.AI's wrongful death lawsuit, Sharp HealthCare's class action over AI-fabricated consent records, and the coming wave of AI malpractice claims — "our dashboard says it worked" is not a defense. Evidence is a defense.

1.2 The Proof Gap

The distinction between documentation and evidence has legal weight. Under the Federal Rules of Evidence:

Authenticity — Evidence must be verified as what it purports to be
Reliability — Evidence must meet trustworthiness standards
Chain of custody — Evidence must have a consistent trail from acquisition to presentation

Conventional AI logging fails all three requirements. Logs can be modified. Timestamps can be forged. Selection can be biased. There is no cryptographic binding between a logged event and the actual inference that occurred.

This creates what we term the proof gap: the space between what organizations claim about AI behavior and what they can demonstrate with independently verifiable evidence.

The proof gap has immediate commercial consequences:

Insurance: Berkley Insurance introduced "Absolute" AI exclusions in late 2025. Verisk/ISO exclusionary forms became effective January 2026. The insurance industry is pricing AI as uninsurable risk — not because they cannot model the risk, but because they cannot verify the risk controls.
Litigation: In Sharp HealthCare's class action, the fundamental problem was not that consent was not obtained — it was that Sharp had no cryptographic evidence distinguishing actual consent from AI hallucination. When the AI system itself generates documentation, the documentation is suspect.
Regulation: EU AI Act Article 12 mandates automatic logging capabilities for high-risk AI systems, effective August 2026. The requirement is not for logs, but for logs that serve "risk detection," "post-market monitoring," and "operational oversight." Current logging serves none of these purposes with evidentiary reliability.

2. Attestable Threat Intelligence: The Primitive

Attestable Threat Intelligence (ATI) is a cryptographic primitive with four defining properties:

Evidence locality by construction — Prompts, responses, and policy evaluations never leave the customer boundary. Only cryptographic commitments (hashes and signatures) are transmitted.
Auditor-reproducible sampling — Third parties possessing the policy key can recompute every sampling decision and verify that every request that should have been sampled has a corresponding attestation.
Co-epoch binding — Every attestation is cryptographically bound to the binary hash of the executing code and the network isolation state at the moment of execution.
Zero-knowledge risk signals — Insurers, auditors, and regulators can assess compliance using only cryptographic artifacts, without ever seeing the underlying data.

This is not encryption applied to existing observability. It is a different architecture where the receipt service is structurally incapable of receiving evidence content.

2.1 Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                        CUSTOMER BOUNDARY                            │
│                                                                     │
│   ┌─────────┐    ┌─────────────┐    ┌─────────────┐                │
│   │  Client │───▶│   Sidecar   │───▶│  AI Service │                │
│   └─────────┘    └──────┬──────┘    └─────────────┘                │
│                         │                                           │
│              ┌──────────┴──────────┐                               │
│              │                     │                               │
│              ▼                     ▼                               │
│   ┌─────────────────┐   ┌─────────────────────┐                   │
│   │  L0 Attestation │   │  Local Evidence     │                   │
│   │  (ALL requests) │   │  Storage (sampled)  │                   │
│   └────────┬────────┘   └─────────────────────┘                   │
│            │                                                        │
│            │ Hash + Signature only                                  │
└────────────┼────────────────────────────────────────────────────────┘
             │
             ▼
┌────────────────────────────────────────────────────────────────────┐
│                      WITNESS NETWORK                                │
│                                                                     │
│   ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐  │
│   │ Receipt Service │   │ Witness Service │   │ Transparency    │  │
│   │ (Closed Schema) │   │ (Epoch Tokens)  │   │ Log (Merkle)    │  │
│   └─────────────────┘   └─────────────────┘   └─────────────────┘  │
│                                                                     │
└────────────────────────────────────────────────────────────────────┘

The architecture has three trust boundaries:

Customer Boundary — Where evidence exists. Prompts, responses, and full policy evaluation results are stored here in content-addressable storage. Nothing crosses this boundary except cryptographic commitments.
Sidecar — Open-source component operating as a transparent proxy via eBPF, iptables, or service mesh integration. Performs governed canonicalization (deterministic CBOR encoding), PRF computation for sampling decisions, and signature generation. Egress is restricted to a single pinned TLS destination.
Witness Network — Distributed across multiple cloud providers (Cloudflare Workers, AWS Lambda, GCP Cloud Run). Issues per-epoch bearer tokens, maintains binary allowlists, and operates the transparency log. The receipt service uses a closed JSON schema (additionalProperties: false) that structurally rejects any field not in the normative specification — it cannot accept evidence content even if compromised.

2.2 Two-Tier Attestation (L0/L2)

Every request generates an L0 attestation. Only sampled requests generate L2 attestations.

L0 Envelope (ALL requests):

Field	Description
`blinded_id`	First 8 bytes of PRF tag — enables linkage without revealing full PRF
`request_commitment`	`HMAC-SHA256(policy_key, "reqdig:v1" \|\| request_digest)`
`encoder_id`	SHA-256 hash of encoder binary (determinism proof)
`metadata_policy_results`	Endpoint, auth scope, region, model ID, deployment
`mono_counter`	Strictly monotonic per sidecar (replay prevention)
`wall_time_ns`	Nanoseconds since Unix epoch
`kid`	Key identifier
`sid`	Sidecar identifier
`signature`	Ed25519 signature over fields 0-7

L2 Envelope (SAMPLED requests):

Field	Description
`l0_reference`	Same as L0.blinded_id
`request_commitment`	Must match L0
`evidence_commitment`	`HMAC(storage_key, "evid:v1" \|\| SHA-256(evidence))`
`prf_tag`	Full 256-bit HMAC output (enables auditor verification)
`policy_scores`	Toxicity, PII detection, bias scores, etc.
`signature`	Ed25519 signature

The critical innovation: L2 envelopes include the complete PRF tag, allowing any auditor with the policy key to verify that the sampling decision was correct without accessing the underlying evidence.

2.3 Policy-Scoped Pseudorandom Function

The sampling function is deterministic and auditor-reproducible:

policy_key = HKDF-SHA256(root_secret, salt="glacis-v1-policy", info=policy_id)

prf_tag = HMAC-SHA256(policy_key, "prf:v1" || policy_id || request_digest)

threshold = floor(sampling_probability × (2^64 - 1))

sample iff BE_uint64(prf_tag[0:8]) ≤ threshold

For 5% sampling: threshold = 0x0ccccccccccccccc

This is not OpenTelemetry's consistent probability sampling. OpenTelemetry samples based on trace IDs using a shared random seed — any node can independently make the same sampling decision for coordination-free distributed tracing. ATI's PRF uses secret keys per policy with the complete 256-bit PRF value embedded in L2 attestations, enabling third-party reproduction of every sampling decision.

Domain separation prevents cross-purpose collisions:

Prefix	Purpose
`prf:v1`	Pseudorandom function
`reqdig:v1`	Request digest commitment
`evid:v1`	Evidence commitment
`bearer:v1`	Heartbeat bearer token
`s3p:sampler:v1`	S3P random sampling

2.4 Co-Epoch Binding

Attestations are bound to the runtime state at the moment of execution through a multi-layer network isolation proof (NETATT):

Layer	Component	Hash Computation
1	Policy	SHA-256(canonical JSON egress policy, RFC 8785)
2	iptables	SHA-256(normalized dump: tables ordered raw/mangle/nat/filter)
3	eBPF	SHA-256(programs sorted by attach point, ELF section order)
4	CNI	SHA-256(RFC 8785 canonical JSON of CNI config)
5	SPKI Pins	Array of SHA-256(DER_SPKI), lexicographically sorted

Final: network_state_hash = SHA-256(CBOR_encode(NETATT))

The receipt service requires co-epoch NETATT for every receipt. A sidecar cannot claim attestations were made during an epoch unless it registered its binary hash and network state during that epoch. This prevents:

Replay attacks — Old attestations cannot be resubmitted
Configuration drift — Binary or network changes without detection
Substitution attacks — One sidecar cannot impersonate another

2.5 The Closed Schema Guarantee

The receipt service API enforces additionalProperties: false at every level:

{
  "type": "object",
  "properties": {
    "attestation_hash": { "type": "string", "pattern": "^[a-f0-9]{64}$" },
    "heartbeat_epoch": { "type": "integer", "minimum": 0 }
  },
  "required": ["attestation_hash", "heartbeat_epoch"],
  "additionalProperties": false
}

Any request containing fields not in this schema receives ERR_SCHEMA_UNKNOWN_FIELD. This is not a policy decision — it is a structural guarantee. The receipt service cannot receive evidence content because there is no field in the schema to put it.

3. Statistical Safety Signal Protocol (S3P)

At scale (10⁹ requests/day), deterministic 5-15% sampling remains expensive. S3P reduces inspection to 0.1-1% while preserving actuarial rigor through exact binomial confidence bounds and cryptographically auditable random selection.

3.1 Cryptographic Random Sampling

At epoch start, the witness generates a 256-bit epoch_nonce via CSPRNG, kept secret until epoch close.

s_tag = HMAC-SHA256(epoch_nonce, "s3p:sampler:v1" || request_digest)

threshold = floor(p × (2^256 - 1))

sample iff BE_uint256(s_tag) ≤ threshold

After epoch close:

Witness publishes {epoch, epoch_nonce, signature}
Auditors verify epoch_nonce_commitment = SHA-256(epoch_nonce)
Auditors recompute s_tag for every request in the Digest Publication Ledger
Auditors verify sample membership and recompute confidence bounds

3.2 Exact Binomial Confidence Intervals

S3P uses Clopper-Pearson intervals rather than normal approximations:

CI_lower = 0 if k=0 else Beta^{-1}(α/2; k, n-k+1)
CI_upper = 1 if k=n else Beta^{-1}(1-α/2; k+1, n-k)

Where k = violations observed, n = samples taken, α = significance level.

This provides exact coverage probability regardless of the underlying violation rate, eliminating the normal approximation assumptions that fail at low sample sizes or extreme rates.

3.3 Conformance Certificates

From n samples with k violations at 95% confidence:

If upper_bound ≤ τ (e.g., 1%):
  Issue ConformanceCertificate {
    epoch,
    sample_count: n,
    violation_count: k,
    upper_bound,
    confidence: 0.95,
    signature
  }

This certificate is a statistically sound claim: "With 95% confidence, the violation rate does not exceed τ." Insurance carriers can price risk against upper_bound, not self-reported violation rates.

4. The Cryptographic Control Loop

Attestation alone is passive observation. The breakthrough that transforms monitoring into guarantee is the cryptographic control loop — formally verified feedback where parameter changes are gated by cryptographic proof of co-epoch state.

4.1 The Innovation

Traditional safety monitoring:

Monitor → Attest → Report → (Human intervenes)

Cryptographic control loop:

Monitor → Attest → Cryptographically-verified automatic adjustment → Self-stabilization

This is not merely automation. It is provable stability — the system mathematically cannot destabilize within defined bounds.

4.2 Five-Gate Actuator Validation

Every ControlAction must pass all five gates:

Signature Verification — Ed25519 signature valid
Epoch Currency — ControlAction.epoch == current_epoch
Parameter Bounds — p_min ≤ sampling_prob ≤ p_max
Co-Epoch Receipt — Receipt exists for metrics bundle in same epoch
Co-Epoch Network Attestation — NETATT exists for same epoch

ALL GATES PASSED: Apply params_after + log action
ANY GATE FAILED:  Reject + maintain params_before

4.3 Formal Stability Proof

The controller uses a Lyapunov function approach:

V(state) = sampling_rate² + backpressure²/MAX_BP²

Prove: dV/dt ≤ 0 for all valid inputs
Result: System converges to stable point

This proof is implemented in TLA+ and compiled to Rust/WASM. Any third party can verify that the controller satisfies its invariants.

4.4 Control Actions

When violation rates exceed thresholds:

Violation Level	Control Response
Low	Increase sampling rate (1% → 5%)
Medium	Increase sampling rate (5% → 20%) + apply backpressure
High	Switch model to safe mode with restricted capabilities
Critical	Open circuit breakers, require human approval for continuation

Every control action is:

Triggered by cryptographic attestation (not logs)
Computed by formally verified logic
Cryptographically signed for third-party verification
Bounded by provable invariants

4.5 Why This Is Novel

No prior art combines these elements:

OpenTelemetry: Passive sampling, no control feedback
HashiCorp Vault: Passive audit logging, no adaptive behavior
IETF RATS: Attestation without control actions
Control Theory: Exists but without cryptographic verification
Formal Methods: Exist but not integrated with attestation

Building this requires the intersection of control theory expertise (rare), formal verification expertise (rarer), and cryptographic engineering (rare). All three together is unprecedented.

5. Zero-Knowledge Risk Signals

ATI produces four risk signals that insurers, auditors, and regulators can assess without accessing evidence content:

5.1 Sampling Coverage

Signal: Verified monitoring proportion via PRF recomputation

Given the Digest Publication Ledger, policy identifier, policy key, and canonicalization rules, an auditor can recompute the PRF tag for every request and verify that every request that should have been sampled has a corresponding L2 envelope.

5.2 Exposure Windows

Signal: Unmonitored intervals from epoch/NETATT gaps

If a sidecar fails to submit heartbeats for an epoch, or submits heartbeats without corresponding NETATT, the exposure window is precisely bounded: [epoch_start, epoch_end] had no cryptographic coverage.

5.3 Response Latency

Signal: Time from violation attestation to applied ControlAction

If a violation is attested at t₁ and the corresponding ControlAction is applied at t₂, the response latency (t₂ - t₁) is cryptographically verifiable. SLA violations are objective facts, not operator claims.

5.4 Retention Integrity

Signal: Evidence commitment sets compared to CAS proofs of possession

The evidence_commitment in L2 envelopes can be verified against content-addressable storage proofs without accessing the evidence itself.

5.5 Parametric Insurance Triggers

These signals enable parametric triggers — automatic insurance events based on cryptographically verifiable conditions:

Trigger: CI_upper > 1% for 3 consecutive epochs
Action: Automatic premium adjustment or coverage limitation

This is not a claim requiring adjustment. It is a mathematical fact verified from the attestation chain.

6. The Market Context: January 2026

6.1 Regulatory Deadlines

Deadline	Regulation	Requirement
August 2, 2026	EU AI Act Article 12	Automatic logging for high-risk AI with risk detection, post-market monitoring, and operational oversight capabilities
June 30, 2026	Colorado AI Act	Comprehensive governance requirements for high-risk AI
January 1, 2026	California SB 1120	Healthcare AI disclosure requirements
Effective now	Verisk ISO exclusions	Mainstream insurers exiting undocumented AI risk

The European Commission's Digital Omnibus proposes conditional delays tied to infrastructure availability — but organizations starting today barely have enough time for August 2026, as conformity assessment alone takes 6-12 months.

6.2 Insurance Market Formation

Munich Re's aiSure program represents the emerging model: attestation-based underwriting where the act of underwriting serves as an independent seal of approval.

From Munich Re's AI specialist: "AI models are inherently probabilistic and will make mistakes... we're comfortable covering a broad range of error rates, from very low to high — they will be reflected in the premium."

Armilla's AI Liability Insurance (Lloyd's, April 2025) provides "explicit, affirmative coverage for AI-related exposures that traditional insurance often fails to address" — but requires verifiable governance controls.

The projected market: $1.2B (2025) → $15-25B (2030) → $60-100B (2035).

6.3 Benchmark Fragmentation

MLCommons AILuminate v1.0 represents the first industry-standard benchmark with letter grades (A-F) across 13 hazard categories. But a UK AI Safety Institute review of 440 benchmarks found ~50% aim to measure abstract concepts like "harmlessness" without clearly defining them, while only 16% use statistical methods when comparing results.

More critically: all existing benchmarks evaluate models in isolation. None evaluate the combined stack of model + guardrails + infrastructure in production deployment.

7. Comparison with Existing Approaches

7.1 Why Hyperscalers Cannot Solve This

AWS, Azure, and GCP face irreconcilable conflicts when attempting to attest their own AI services:

Commercial conflict — They sell AI inference; they cannot credibly audit it
Single-cloud lock-in — Enterprises use multiple providers; no hyperscaler can be the neutral aggregator
Evidence custody model — Their business model requires data transmission; zero-egress conflicts fundamentally

This is the Arthur Andersen problem: independent third-party attestation is architecturally necessary because hyperscalers cannot credibly audit their own commercial products.

7.2 Why Observability Cannot Solve This

Datadog, Arthur AI, and Arize record events. They cannot prove events were not modified. They cannot prove sampling was unbiased. They cannot produce cryptographic evidence admissible in court.

More fundamentally: observability requires evidence egress. Every prompt and response must be transmitted to the monitoring service. This conflicts with:

HIPAA data residency requirements
EU AI Act data protection provisions
Enterprise security policies prohibiting PHI transmission
Customer contracts specifying data boundaries

Zero-egress is not a feature. It is an architectural requirement.

7.3 Why Guardrails Cannot Solve This

NeMo Guardrails, Guardrails AI, and CalypsoAI execute safety controls. They cannot prove they executed. They operate as detection-plane components — observing traffic and potentially blocking it. They do not generate cryptographic attestations of what they did.

The October 2025 research showing 90%+ attack success rates against published defenses demonstrates the fundamental weakness: guardrails are classifiers, and classifiers can be fooled.

ATI does not replace guardrails. It proves they executed — and provides the feedback loop to improve them when they fail.

8. Implementation Status

As of January 2026, the GLACIS implementation provides:

Component	Status
Hierarchical key derivation (HKDF-SHA256)	Fully implemented (25 tests)
Evidence commitments with domain separation	Fully implemented (16 tests)
Federated witness services (3 cloud providers)	Fully implemented (103 tests)
Auditor PRF recomputation tools	Fully implemented (CLI + Rust verifier)
Cryptographic control loop	Fully implemented (28 tests, TLA+ specification)
S3P random sampling	Specified (RFC test vectors implemented)
Digest Publication Ledger	Specified (types defined, implementation pending)
Insurance parametric API	Foundation implemented (control loop provides signals)

The open-source sidecar runs on eBPF-capable Linux kernels. The witness network operates across Cloudflare Workers, AWS Lambda, and GCP Cloud Run with 99.9% availability.

9. Conclusion: The Category Definition

Attestable Threat Intelligence is not better logging. It is not encrypted observability. It is not compliance automation.

ATI is a new primitive — cryptographic proof of safety control execution with evidence locality, auditor reproducibility, and zero-knowledge risk signals.

The category emerges from the recognition that detection is not proof, documentation is not evidence, and trust is not verification. As AI systems assume responsibility for consequential decisions — medical diagnoses, credit determinations, hiring recommendations — the question shifts from "Did it work?" to "Can you prove it worked?"

The answer requires:

Evidence that cannot leave — Zero-egress architecture where prompts and responses never cross the customer boundary
Sampling that can be reproduced — Deterministic PRF with embedded verification enabling third-party audit
Binding that cannot be forged — Co-epoch attestation linking receipts to binary and network state
Controls that cannot destabilize — Formally verified feedback loops with cryptographic gating

This is the infrastructure that makes AI governance verifiable rather than claimed.

Appendix A: Glossary

Attestable Threat Intelligence (ATI): Cryptographic primitive providing proof of safety control execution without evidence egress.

Co-epoch binding: Cryptographic linkage of attestations to binary hash and network isolation state within a temporal window.

Evidence locality: Architectural guarantee that prompts, responses, and policy evaluations never leave the customer boundary.

L0 attestation: Metadata-only attestation generated for every request.

L2 attestation: Full attestation with evidence commitment generated for sampled requests.

NETATT: Network-isolation attestation capturing iptables, eBPF, CNI, and SPKI state.

Policy-scoped PRF: HMAC-based pseudorandom function keyed per policy enabling deterministic, auditor-reproducible sampling.

S3P (Statistical Safety Signal Protocol): Ultra-low-rate (0.1-1%) sampling with exact binomial confidence bounds and post-epoch nonce reveal.

Zero-knowledge risk signal: Verifiable risk metric derived from cryptographic artifacts without evidence content access.

Appendix B: References

Standards and Specifications

IETF SCITT (Supply Chain Integrity, Transparency, and Trust): draft-ietf-scitt-architecture-22
RFC 8949: Concise Binary Object Representation (CBOR)
RFC 8785: JSON Canonicalization Scheme (JCS)
RFC 6962: Certificate Transparency
ISO/IEC DIS 24970:2025: AI system logging

Regulatory

EU AI Act Article 12: Record-keeping requirements (effective August 2, 2026)
Colorado AI Act SB 21-169 (effective June 30, 2026)
California SB 1120/AB 3030 (effective January 1, 2026)

Research

"Are Defenses Against Prompt Injection and Jailbreaking Evaluated Well?" (OpenAI, Anthropic, DeepMind, October 2025)
"Prompt Injection Attacks in Large Language Models and AI Agent Systems" (MDPI Information, January 2026)
MLCommons AILuminate v1.0 Benchmark (December 2025)

Market

Munich Re aiSure: https://www.munichre.com/en/solutions/for-industry-clients/insure-ai.html
Armilla AI Liability Insurance (Lloyd's): https://www.reinsurancene.ws/armilla-reveals-purpose-built-ai-liability-insurance/

This paper describes technology covered by pending patent applications. GLACIS, the GLACIS logo, and Attestable Threat Intelligence are trademarks of GLACIS, Inc.

Contact: [email protected]