What is the OWASP LLM Top 10?

The OWASP LLM Top 10 is a standardized list of the ten most critical security risks for applications that use large language models. Published by the Open Worldwide Application Security Project (OWASP), it covers prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance, and model theft.

What is the number one risk in the OWASP LLM Top 10?

LLM01: Prompt Injection is the number one risk. It occurs when an attacker manipulates the LLM through crafted inputs that override the system prompt or inject instructions via external data sources. This can lead to data exfiltration, unauthorized actions, or safety bypass.

How does continuous monitoring address OWASP LLM risks?

Continuous monitoring addresses OWASP LLM risks by observing model behavior in production in real time. For prompt injection, it detects instruction-override patterns. For sensitive information disclosure, it scans outputs for PII leakage. For excessive agency, it tracks tool calls against authorized scopes. This runtime approach catches threats that pre-deployment testing misses.

OWASP LLM Top 10: risks and continuous monitoring

What the OWASP LLM Top 10 is

The OWASP Top 10 for LLM Applications is the standard taxonomy of security risks for systems built on large language models. Published by the Open Worldwide Application Security Project, it gives security teams a shared language for the vulnerabilities unique to LLM-powered products — from prompt injection to model theft.

The list isn’t academic. It’s the framework that auditors, regulators, and red teams reference when evaluating AI systems. The EU AI Act’s technical documentation requirements map directly to these categories. So do NIST AI RMF controls and insurance underwriting questionnaires.

Below, we walk through each risk: what it is, how it shows up in real deployments, the corresponding MITRE ATLAS technique, and which OVERT controls address it through continuous monitoring.

The ten risks

LLM01

Prompt Injection

An attacker crafts input that overrides the system prompt — either directly (user-facing) or indirectly (via poisoned documents, emails, or web pages the model retrieves). The model follows the attacker’s instructions instead of the developer’s.

In production: RAG-based systems that ingest external documents are especially vulnerable. A single poisoned PDF in the retrieval index can redirect every conversation that references it.

ATLAS: AML.T0051 OVERT: ov-1.1, ov-2.1 Monitor: Input pattern analysis, instruction-boundary enforcement

LLM02

Insecure Output Handling

The application passes LLM output directly to downstream systems without validation. The model generates SQL, JavaScript, shell commands, or API calls that the application executes, enabling injection attacks through the model as a proxy.

In production: Code-generation assistants and agentic systems that execute model-produced code are primary targets. The model becomes an unwitting intermediary for traditional injection attacks.

ATLAS: AML.T0048 OVERT: ov-2.2, ov-3.1 Monitor: Output sanitization validation, tool-call argument auditing

LLM03

Training Data Poisoning

Malicious data in the training set introduces backdoors, biases, or targeted vulnerabilities. This includes poisoned fine-tuning data, compromised RLHF feedback, and manipulated retrieval indexes.

In production: Organizations that fine-tune models on customer data or maintain dynamic RAG indexes are exposed. Even a small percentage of poisoned training examples can shift model behavior in targeted ways.

ATLAS: AML.T0020 OVERT: ov-1.2, ov-4.1 Monitor: Data lineage tracking, behavioral drift detection

LLM04

Model Denial of Service

Attackers craft inputs that consume disproportionate compute resources — extremely long contexts, recursive reasoning prompts, or inputs that trigger worst-case inference paths. The result is degraded service or complete unavailability.

In production: Multi-tenant AI services are especially vulnerable. A single user’s adversarial input can exhaust GPU capacity that serves hundreds of other customers.

ATLAS: AML.T0029 OVERT: ov-2.3 Monitor: Token-rate limiting, inference-cost tracking per request

LLM05

Supply Chain Vulnerabilities

Compromised model weights, tampered training pipelines, poisoned pre-trained models on public hubs, or vulnerable third-party plugins. The LLM supply chain has many points of entry for attackers.

In production: Teams pulling models from Hugging Face Hub or using community plugins without verification inherit unknown risks. Model provenance is rarely tracked.

ATLAS: AML.T0010 OVERT: ov-4.1, ov-4.3 Monitor: Model hash verification, supply chain attestation

LLM06

Sensitive Information Disclosure

The model leaks PII, credentials, system prompts, proprietary data, or other sensitive information through its responses. This can occur through memorization of training data or through context-window content being referenced in unrelated conversations.

In production: Healthcare and financial services AI systems handling patient records or transaction data face the highest exposure. A single leaked SSN or diagnosis is a reportable breach.

ATLAS: AML.T0024 OVERT: ov-2.1, ov-3.2 Monitor: PII scanning on outputs, context-boundary enforcement

LLM07

Insecure Plugin Design

LLM plugins and tool integrations lack proper input validation, authentication, or access controls. The model can be manipulated into calling plugins with malicious arguments, accessing resources beyond its intended scope.

In production: Agentic systems with multiple tool integrations multiply this risk. Each new tool is a new privilege boundary that must be independently secured. See our agentic AI security guide.

ATLAS: AML.T0040 OVERT: ov-3.1, ov-3.3 Monitor: Tool-call argument validation, scope enforcement

LLM08

Excessive Agency

The LLM is granted too many permissions, too broad a scope, or too much autonomy. It can take actions — sending emails, modifying databases, executing code — that go beyond what the application requires.

In production: The drift from “chatbot that answers questions” to “agent that takes actions” often happens incrementally, with each new tool integration expanding the blast radius of a compromise. See our agentic AI security guide.

ATLAS: AML.T0048 OVERT: ov-2.2, ov-3.1 Monitor: Action-scope auditing, least-privilege enforcement

LLM09

Overreliance

Users or downstream systems trust LLM output without verification, treating it as ground truth. The model hallucinates facts, citations, code, or medical/legal/financial advice that appears authoritative but is fabricated.

In production: In healthcare, unverified AI output can lead to misdiagnosis. In legal, fabricated citations have led to court sanctions. Overreliance is a system design problem, not just a user behavior problem.

ATLAS: AML.T0043 OVERT: ov-1.3, ov-2.4 Monitor: Hallucination detection, confidence-score tracking, human-in-the-loop enforcement

LLM10

Model Theft

Unauthorized access to model weights, fine-tuned variants, or proprietary training data. This includes model extraction through API queries (distillation attacks) and direct exfiltration of model artifacts.

In production: Organizations that fine-tune foundation models with proprietary data create high-value targets. The fine-tuned model embeds trade secrets in its weights.

ATLAS: AML.T0024 OVERT: ov-4.2, ov-4.3 Monitor: API rate-limiting, query pattern analysis, model artifact access logging

How continuous monitoring addresses the top 10

Pre-deployment testing catches known attack patterns. Continuous monitoring catches everything else — the novel attacks, the regressions after model updates, the emergent behaviors in production that no test suite predicted.

GLACIS maps each OWASP LLM risk to specific runtime monitoring capabilities:

autoredteam tests for all ten risk categories continuously, not just at deployment time. New attack patterns from the research community are added to the test suite within days of publication.
Enforce applies runtime guardrails for the risks that require real-time intervention: prompt injection detection (LLM01), output sanitization (LLM02), PII filtering (LLM06), and tool-call scope enforcement (LLM07, LLM08).
Notarize creates attestation records for every model interaction, building the audit trail that demonstrates ongoing compliance with OWASP-aligned security controls.

Interactive

OWASP LLM risk scan

See autoredteam assess your model against all 10 OWASP LLM risk categories in five minutes.

Get runtime coverage

Explore further

Pillar guide

Test against the top 10

Run autoredteam against your model to see how it holds up across all ten OWASP LLM risk categories.

autoredteam on GitHub Get runtime coverage

Navigate

Solutions

Evidence

Regulations

OWASP LLM Top 10

What the OWASP LLM Top 10 is

The ten risks

Prompt Injection

Insecure Output Handling

Training Data Poisoning

Model Denial of Service

Supply Chain Vulnerabilities

Sensitive Information Disclosure

Insecure Plugin Design

Excessive Agency

Overreliance

Model Theft

How continuous monitoring addresses the top 10

OWASP LLM risk scan

Explore further

AI runtime security

AI penetration testing

AI agent security

Prompt injection

AI red teaming

AI supply chain security

AI data governance

Test against the top 10