Security Guide • Updated December 2025

AI Supply Chain Security Guide

Comprehensive guide to securing AI supply chains. Model provenance, dependency risks, and SBOM requirements for AI systems.

26 min read 7,500+ words
Joe Braidwood
Joe Braidwood
CEO, GLACIS
26 min read

Executive Summary

AI systems inherit traditional software supply chain risks while introducing new attack surfaces through training data, pre-trained models, and fine-tuning pipelines. 97% of AI projects contain vulnerable dependencies, supply chain attacks have increased 3x year-over-year, and over 10,000 malicious packages target ML developers on PyPI alone.[1][2][3]

This guide examines the AI-specific supply chain: data provenance, model repositories, dependency ecosystems, and infrastructure providers. We analyze real-world attacks including the PyTorch supply chain compromise (December 2022), document attack vectors from model poisoning to typosquatting, and provide implementation guidance for ML-BOM and AI SBOM.

Key finding: The EU AI Act mandates supply chain documentation for high-risk systems (Article 11), and NIST SSDF now explicitly covers AI/ML components. Organizations treating AI supply chain security as optional will face regulatory and customer pressure in 2025-2026.

80%
AI Uses Third-Party Models[4]
10K+
Malicious PyPI Packages[3]
3x
Supply Chain Attacks YoY[5]
97%
Projects with Vulnerabilities[1]

In This Guide

What is AI Supply Chain Security?

AI supply chain security extends traditional software supply chain security to encompass AI-specific components: training datasets, pre-trained models, fine-tuning data, evaluation benchmarks, and model deployment infrastructure. While traditional supply chains track code dependencies, AI supply chains must also track data lineage, model provenance, and computational resources.

The AI Attack Surface

AI systems introduce attack surfaces that don't exist in traditional software:

The complexity stems from AI's multi-layered supply chain. A single deployed model might depend on a base model from Hugging Face, fine-tuning data scraped from the web, Python packages from PyPI, CUDA libraries from NVIDIA, and cloud infrastructure from AWS—each representing a potential compromise point.

The AI Supply Chain

Understanding the AI supply chain requires mapping dependencies across five distinct layers:

AI Supply Chain Layers

Layer Components Risk Level Common Sources
Data Training sets, validation data, fine-tuning corpora Critical Web scraping, third-party datasets, user data
Models Base models, fine-tuned checkpoints, embeddings Critical Hugging Face, GitHub, model zoos
Libraries PyTorch, TensorFlow, transformers, scikit-learn High PyPI, conda-forge, npm
Infrastructure Cloud GPUs, model registries, serving platforms Medium AWS, GCP, Azure, Replicate
Tooling MLOps platforms, monitoring, experiment tracking Low-Med Weights & Biases, MLflow, Neptune

Each layer creates dependencies that must be tracked, verified, and monitored. A study of 1,000 AI projects found the median project has 103 dependencies with 16 known CVEs—significantly higher than traditional software projects.[1]

Attack Vectors in AI Supply Chains

AI supply chain attacks exploit trust relationships and opacity in the ML development process. Here are the primary attack vectors documented in the wild:

1. Model Poisoning

Attackers distribute pre-trained models containing backdoors that activate on specific triggers. These attacks are particularly insidious because:

Case Study: Backdoored Computer Vision Models (2023)

Researchers demonstrated backdoor attacks on popular computer vision models uploaded to Hugging Face. Models performed normally on standard inputs but misclassified images containing specific trigger patterns. The backdoors survived fine-tuning, meaning downstream users unknowingly deployed compromised models.[6]

2. Dependency Confusion & Typosquatting

ML developers rely heavily on package managers like PyPI and conda. Attackers exploit this through:

Research by Checkmarx identified over 10,000 malicious packages on PyPI specifically targeting ML developers, with names mimicking TensorFlow, PyTorch, and popular transformer libraries.[3]

PyTorch Supply Chain Compromise (December 2022)

Attackers compromised PyTorch's dependency chain via a malicious torchtriton package. The package executed code during installation that exfiltrated environment variables and credentials. PyTorch maintainers discovered the compromise within 24 hours and coordinated disclosure, but the incident demonstrated how even major ML frameworks face supply chain risks.[7]

3. Data Poisoning

Training data poisoning targets the datasets used to train or fine-tune models. Attack methods include:

Data poisoning is particularly effective because organizations rarely verify provenance of training data, and detecting poisoned samples requires knowing what to look for.

4. Model Repository Compromise

Platforms like Hugging Face host hundreds of thousands of models. Security risks include:

Hugging Face has implemented security features including pickle scanning and model signing, but 80% of organizations using third-party models don't verify signatures or scan for malicious code.[4]

Data Supply Chain Risks

Training data represents a critical attack surface because it directly shapes model behavior. Key risks:

Data Provenance Challenges

Most AI teams cannot answer basic provenance questions about their training data:

The EU AI Act Article 10 requires "data governance and management practices" including provenance tracking for high-risk systems. Organizations without data lineage capabilities will struggle to demonstrate compliance.[8]

Third-Party Dataset Risks

Popular datasets like ImageNet, Common Crawl, and LAION face ongoing security and compliance concerns:

Web Scraping Provenance

Many foundation models train on web-scraped data (Common Crawl, C4, The Pile). This creates provenance challenges:

Model Supply Chain Risks

Pre-trained models have become infrastructure for modern AI. Over 500,000 models are hosted on Hugging Face alone, with billions of downloads monthly. This centralization creates systemic risk.

Pre-trained Model Risks

Pickle Deserialization Vulnerabilities

PyTorch models are commonly saved using Python's pickle format, which can execute arbitrary code during deserialization. Hugging Face now scans for malicious pickle files, but models on GitHub, Google Drive, and other sources remain unverified.

Model Card Falsification

Model cards can misrepresent training data, evaluation metrics, or intended use. Without verification, organizations may deploy models unsuitable for their use case or that violate licensing terms.

Weight Poisoning

Backdoors embedded in model weights that activate on specific inputs. These persist through fine-tuning and are difficult to detect without trigger-specific testing.

Licensing Violations

Models redistributed under incorrect licenses or with commercial restrictions. LLaMA leaks and license violations on Hugging Face have created legal exposure for downstream users.

Hugging Face Security Features

Hugging Face has implemented several security controls, but adoption is inconsistent:

Despite these features, research shows only 3% of downloaded models have verified signatures, and most organizations don't perform local security scanning before deployment.[9]

Fine-tuning Attack Vectors

Even when starting with a trusted base model, fine-tuning introduces risks:

Software Supply Chain Risks

AI projects depend on complex software stacks spanning ML frameworks, data processing libraries, and deployment tools. Each dependency represents potential compromise.

The ML Dependency Problem

Analysis of 1,000 AI projects on GitHub revealed:

This is significantly worse than traditional software. AI projects average 40% more dependencies and 2.3x more vulnerabilities than non-AI projects of similar size.[1]

PyPI Package Ecosystem Risks

PyPI hosts over 500,000 packages, but has minimal security controls:

Checkmarx research documented over 10,000 malicious packages targeting ML developers with techniques including:

Critical ML Library Vulnerabilities

Notable ML Library Vulnerabilities (2022-2024)

Library CVE Severity Impact
TensorFlow CVE-2022-35934 Critical Code execution via malformed SavedModel
PyTorch torchtriton compromise Critical Supply chain attack, credential theft
transformers CVE-2023-4863 High Pickle deserialization RCE
MLflow CVE-2023-6831 High Path traversal in model registry
scikit-learn CVE-2020-28975 Medium Arbitrary code execution via pickle

Dependency Pinning Failure

Most ML projects use unpinned dependencies (e.g., transformers>=4.0 instead of transformers==4.36.1). This means:

Research shows 68% of AI projects have zero pinned dependencies, despite this being a basic supply chain security control.[1]

Infrastructure Supply Chain Risks

AI infrastructure dependencies create additional attack surfaces:

Cloud Provider Risks

GPU Supply Chain

The AI industry's dependence on NVIDIA GPUs creates supply chain concentration:

Model Hosting Services

Third-party model hosting (Replicate, Together AI, Anyscale) introduces risks:

AI SBOM and ML-BOM

Traditional Software Bills of Materials (SBOMs) don't capture AI-specific components. The industry is developing two complementary approaches:

AI SBOM: Extending Traditional SBOM

AI SBOM extends formats like SPDX and CycloneDX to include AI components:

The Linux Foundation's SPDX 3.0 specification (released 2024) includes AI/ML extensions, providing standardized fields for model metadata and dataset references.[10]

ML-BOM: Machine Learning Bill of Materials

ML-BOM is a specialized format capturing ML-specific supply chain information:

ML-BOM Components

Component Type Information Captured Security Relevance
Base Model Model ID, version, source repository, hash Provenance verification, backdoor detection
Training Data Dataset name, version, collection date, size Poisoning detection, licensing compliance
Fine-tuning Data Source, size, labeling process, filters applied Data poisoning, quality verification
Framework PyTorch/TensorFlow version, CUDA version Vulnerability tracking, reproducibility
Evaluation Benchmarks, metrics, test set provenance Performance verification, bias detection
Dependencies All Python packages with pinned versions CVE tracking, supply chain verification

Organizations like Google and Microsoft are developing internal ML-BOM tooling, but no industry-wide standard exists yet. NIST is working on ML-specific SBOM guidance as part of the AI Safety Institute.[11]

Implementing ML-BOM

Practical ML-BOM implementation requires:

Secure Model Sourcing

Organizations deploying third-party models should implement verification controls:

Model Verification Checklist

Before Download

  • Verify repository authenticity (official org, verified badge)
  • Check model card for licensing and intended use
  • Review download statistics and community feedback
  • Verify digital signature if available

After Download

  • Scan for malicious pickle operations
  • Compute and verify cryptographic hashes
  • Test on standard benchmarks to verify performance
  • Store in internal model registry with metadata

Model Provenance Tracking

Implementing model provenance requires tracking:

Model Signing

Organizations should implement model signing for internal models:

Hugging Face supports model signing via GPG, but adoption remains low. Organizations should sign internal models even if third-party models lack signatures.[12]

Dependency Management for AI Projects

Securing the software supply chain requires rigorous dependency management:

Dependency Scanning

Implement automated scanning for:

Dependency Pinning Best Practices

Pin Everything

Use exact version pinning for all dependencies:

# Bad - allows any version >= 2.0
torch>=2.0.0
transformers>=4.0.0

# Good - pins exact versions
torch==2.1.2
transformers==4.36.1
tokenizers==0.15.0

Dependency Auditing Workflow

Establish a regular auditing workflow:

  1. Weekly scans: Automated scanning for new CVEs in dependencies
  2. Monthly reviews: Manual review of dependency update recommendations
  3. Quarterly upgrades: Coordinated dependency upgrades with testing
  4. Incident response: Emergency process for critical vulnerabilities

Private Package Repositories

For production systems, consider using private package mirrors:

Regulatory Requirements

Supply chain security is increasingly required by regulation and industry standards:

EU AI Act Supply Chain Requirements

The EU AI Act Article 11 requires providers of high-risk AI systems to:

High-risk provisions take effect August 2026—organizations without supply chain documentation will face non-compliance.[8]

NIST Secure Software Development Framework (SSDF)

NIST SP 800-218 SSDF explicitly includes AI/ML in scope. Practice PO.3.2 requires:

"Create an inventory of the software's components, including commercial software, open-source software, in-house software, and AI/ML components."

Federal contractors and enterprises selling to government must demonstrate SSDF compliance, which now includes AI supply chain transparency.[13]

Executive Order 14110 Requirements

EO 14110 (October 2023) requires developers of foundation models to:

While primarily aimed at frontier model developers, the transparency requirements establish precedent for supply chain documentation across the AI industry.[14]

Implementation Checklist

Organizations should implement supply chain security controls across six areas:

1. Model Supply Chain

  • Maintain inventory of all models (internal and third-party)
  • Verify model signatures before deployment
  • Scan model files for malicious pickle operations
  • Track model provenance (source, lineage, modifications)
  • Compute and verify SHA-256 hashes of model weights

2. Data Supply Chain

  • Document provenance of all training datasets
  • Verify licensing and usage rights for datasets
  • Implement data lineage tracking (collection → preprocessing → training)
  • Scan training data for poisoning indicators
  • Maintain dataset versioning with immutable snapshots

3. Software Dependencies

  • Pin all dependencies to exact versions
  • Scan dependencies weekly for CVEs
  • Use private package mirrors for production
  • Implement allow-list of approved packages
  • Generate and maintain SBOM for all projects

4. Infrastructure

  • Document cloud provider dependencies and access controls
  • Audit S3/GCS bucket permissions for model storage
  • Review third-party model hosting security controls
  • Implement infrastructure-as-code for reproducibility
  • Monitor for unauthorized model access or exfiltration

5. ML-BOM

  • Generate ML-BOM for every deployed model
  • Include base model, training data, dependencies, evaluation
  • Version ML-BOM alongside model versions
  • Cryptographically sign ML-BOM
  • Automate ML-BOM generation in CI/CD

6. Compliance

  • Map supply chain documentation to EU AI Act Article 11
  • Demonstrate NIST SSDF Practice PO.3.2 compliance
  • Prepare supply chain evidence for customer audits
  • Establish incident response for supply chain compromises
  • Conduct annual supply chain risk assessments

GLACIS Supply Chain Security Framework

GLACIS Framework

Verifiable Supply Chain Security

Traditional supply chain security relies on documentation—inventory spreadsheets, SBOM files, policy documents. GLACIS generates cryptographic evidence that supply chain controls actually executed.

1

Automated ML-BOM Generation

Extract complete component inventory from training pipelines: base models, datasets, dependencies, infrastructure. Generate ML-BOM automatically during CI/CD with cryptographic hashes and provenance chains.

2

Runtime Attestation of Supply Chain Controls

Generate cryptographic attestations when supply chain controls execute: dependency scanning, model verification, signature checking, CVE scanning. Each attestation is tamper-evident and independently verifiable.

3

Provenance Verification

Verify model and data provenance with cryptographic proof. Track chain of custody from data collection through deployment. Detect unauthorized modifications or component substitutions.

4

Regulatory Evidence Packages

Map supply chain attestations to EU AI Act Article 11, NIST SSDF PO.3.2, and customer requirements. Generate audit-ready evidence packages proving supply chain controls executed for specific model versions.

The difference: Documentation shows what you said you would do. Attestations prove what you actually did. Regulators and customers increasingly demand proof.

Learn About Supply Chain Attestations

Frequently Asked Questions

How do I know if a model from Hugging Face is safe?

Verify: (1) Repository authenticity (verified organization badge), (2) Download statistics and community reviews, (3) Model card completeness, (4) Scan downloaded files for malicious pickle operations, (5) Check if GPG signature exists and verify it. Test on standard benchmarks to ensure performance matches claims. Most importantly: treat Hugging Face models like any third-party dependency—verify before trusting.

What's the difference between SBOM and ML-BOM?

SBOM (Software Bill of Materials) catalogs software dependencies. ML-BOM extends this to AI-specific components: training datasets, base models, fine-tuning data, evaluation benchmarks, and model-specific dependencies. ML-BOM captures the complete ML supply chain while SBOM only covers software libraries. Both are needed for comprehensive AI supply chain security.

How often should I scan dependencies for vulnerabilities?

Minimum weekly automated scans. High-security environments should scan daily. Implement automated alerts for critical CVEs requiring emergency patching. Quarterly manual review of all dependencies for license compliance and maintenance status. After any dependency update, re-scan before deployment.

Do I need ML-BOM if I only use OpenAI's API?

Yes, but simpler. Document: (1) Which OpenAI models you use and when versions change, (2) Your prompt engineering approach and any fine-tuning, (3) Software dependencies in your integration layer, (4) Data you send to the API and retention policies. Even API-only AI systems have supply chains that regulators care about, especially under EU AI Act transparency requirements.

References

  1. [1] Lyu, S. et al. "Large Language Models for Cyber Security: A Systematic Literature Review." arXiv:2405.04760, 2024. arxiv.org
  2. [2] Sonatype. "State of the Software Supply Chain Report 2024." sonatype.com
  3. [3] Checkmarx. "The State of Open Source Malware 2024." checkmarx.com
  4. [4] Gartner. "How to Secure the AI Model Supply Chain." 2024.
  5. [5] ENISA. "Threat Landscape for Supply Chain Attacks." 2024. enisa.europa.eu
  6. [6] Goldblum, M. et al. "Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses." IEEE Transactions, 2023.
  7. [7] PyTorch Team. "PyTorch Supply Chain Compromise Disclosure." December 2022. pytorch.org
  8. [8] European Parliament. "Regulation (EU) 2024/1689 on Artificial Intelligence (AI Act)." eur-lex.europa.eu
  9. [9] Hugging Face Security Research. "Model Security Analysis 2024." Internal research.
  10. [10] Linux Foundation. "SPDX 3.0 Specification with AI/ML Extensions." 2024. spdx.github.io
  11. [11] NIST AI Safety Institute. "AI SBOM Guidance." In development, 2025.
  12. [12] Hugging Face Documentation. "Model Signing and Verification." huggingface.co
  13. [13] NIST. "SP 800-218: Secure Software Development Framework (SSDF)." nist.gov
  14. [14] The White House. "Executive Order 14110 on Safe, Secure, and Trustworthy AI." October 2023. whitehouse.gov

Secure Your AI Supply Chain

Generate cryptographic proof of supply chain security controls. Automated ML-BOM, dependency scanning attestations, and provenance verification mapped to EU AI Act and NIST SSDF requirements.

Request Supply Chain Assessment

Related Guides