Checklist • Updated December 2025

AI Vendor Due Diligence Checklist

Complete framework for evaluating AI vendors. Security assessments, compliance verification, and risk scoring methodology.

18 min read 50+ criteria
Joe Braidwood
Joe Braidwood
CEO, GLACIS
18 min read

Executive Summary

Traditional IT vendor due diligence is insufficient for AI systems. While 65% of enterprises have SOC 2 requirements for software vendors, fewer than 35% of AI vendors hold these certifications. More critically, even certified vendors often lack transparency into model behavior, training data provenance, or bias mitigation—the actual AI-specific risks.[1][2]

This guide provides a comprehensive AI vendor assessment framework covering security foundations, model transparency, data practices, bias testing, regulatory compliance, business continuity, and contractual protections. It includes a 50+ item checklist and maps to EU AI Act supply chain obligations taking effect in 2026.

Key finding: Organizations deploying high-risk AI without vendor transparency face regulatory penalties up to €35M under the EU AI Act and class-action exposure as demonstrated by the $2.2M SafeRent settlement. Due diligence is no longer optional—it's a legal requirement.

65%
Vendors Lack SOC 2[1]
40%
Can't Explain Models[2]
8 Weeks
Avg Assessment Time[3]
50+
Checklist Items

In This Guide

Why AI Vendor Due Diligence is Different

Traditional IT vendor assessments focus on infrastructure security, data protection, and operational reliability. These remain necessary but insufficient for AI vendors. AI systems introduce fundamentally new risk categories that standard frameworks don't address:

Model Opacity (Black Box Risk)

Traditional software executes explicit logic you can audit. AI models—particularly neural networks—make decisions through learned patterns that even their creators cannot fully explain. When a vendor's model denies a loan application, recommends a clinical intervention, or flags a transaction as fraudulent, can they explain why? Over 40% of AI vendors cannot provide interpretable explanations for high-stakes decisions.[2]

Hallucinations & Confabulation

Large language models generate plausible-sounding false information with startling confidence. This isn't a software bug you can patch—it's intrinsic to how these systems work. In legal services, 68% of professionals cite hallucinations as their top AI concern, with over 40% of law firms reporting LLM drafts requiring full manual revision.[4] Analysts estimate global losses from AI hallucinations reached $67.4 billion in 2024.[5]

Training Data Provenance

AI models inherit the biases, errors, and legal risks of their training data. If a vendor trained their hiring model on historical data from a company later sued for discrimination, your deployment inherits that liability. Most vendors cannot or will not disclose training data composition, provenance, or licensing status—creating copyright and privacy exposure you cannot assess.

Model Drift & Degradation

Unlike traditional software, AI model performance degrades over time as real-world data distribution shifts away from training conditions. A fraud detection model trained pre-pandemic may perform poorly on current attack patterns. Does the vendor monitor for drift? How frequently do they retrain? What performance guarantees persist beyond initial deployment?

AI-Specific Attack Vectors

Prompt injection, data poisoning, model extraction, adversarial examples—AI systems face novel attack patterns that traditional security assessments don't evaluate. Penetration testing must specifically target AI components, not just surrounding infrastructure. The Texas Attorney General's investigation of Pieces Technologies revealed vendors making claims like "<0.001% hallucination rate" without evidence—and without security teams catching it.[6]

The AI Vendor Risk Framework

Effective AI vendor assessment requires expanding traditional third-party risk management to include AI-specific dimensions. We recommend evaluating vendors across eight core risk categories:

AI Vendor Risk Categories

Risk Category Key Questions Evidence Required
Security & Infrastructure SOC 2? Pen testing? Data encryption? Certifications, audit reports, test results
Model Transparency Can they explain decisions? Model architecture documented? Model cards, technical documentation, interpretability demos
Data Practices Training data sources? Privacy controls? Retention policies? Data lineage documentation, DPIAs, data flow diagrams
Bias & Fairness Bias testing performed? Fairness metrics tracked? Fairness assessments, demographic parity analysis, mitigation plans
Regulatory Compliance EU AI Act ready? GDPR compliant? Industry regulations? Compliance roadmaps, legal opinions, certification status
Business Continuity SLAs? Model versioning? Exit strategy? Service agreements, disaster recovery plans, data portability specs
Contractual Protections Liability caps? IP ownership? Audit rights? MSA, DPA, indemnification terms, insurance coverage
Ongoing Governance Performance monitoring? Incident response? Update cadence? Monitoring dashboards, incident logs, change management process

Security Assessment

Start with traditional security foundations, then layer AI-specific security requirements. The absence of basic security certifications should be disqualifying for high-risk use cases.

Standard Security Requirements

AI-Specific Security Requirements

Critical Gap: AI Attack Vector Testing

Standard penetration testing focuses on infrastructure vulnerabilities. AI systems require specialized testing for:

  • Prompt Injection: Can attackers manipulate outputs through crafted inputs?
  • Data Poisoning: Can training data be corrupted to degrade model performance?
  • Model Extraction: Can attackers reconstruct proprietary models through API queries?
  • Adversarial Examples: Can attackers craft inputs that fool the model?

Request evidence of AI-specific security testing. The OWASP Top 10 for LLM Applications provides a baseline for evaluating LLM-based vendors.[7]

Model Transparency & Explainability

Model transparency determines whether you can understand, validate, and defend the vendor's AI decisions. This is critical for regulatory compliance and liability management.

Documentation Requirements

The vendor should provide comprehensive technical documentation covering:

Model Cards

Standardized documentation (proposed by Google) covering model architecture, intended use, performance characteristics, limitations, and ethical considerations. Model cards should specify:

  • Model type and architecture
  • Training data characteristics
  • Performance metrics by demographic group
  • Known limitations and failure modes
  • Intended use cases and out-of-scope applications

Datasheets for Datasets

Documentation of training data including collection methodology, composition, preprocessing, and known biases. Datasheets should cover:

  • Data source and collection process
  • Demographic composition
  • Labeling methodology and annotator instructions
  • Data cleaning and preprocessing steps
  • Legal basis for data use

Explainability & Interpretability

For high-risk applications (credit decisions, hiring, medical diagnosis), the vendor must provide interpretable explanations for individual decisions. Acceptable approaches include:

Test this in practice. Request the vendor demonstrate explainability on sample cases relevant to your use case. Generic marketing claims about "explainable AI" are insufficient—demand working demonstrations.

Data Practices

Data practices determine privacy risk, regulatory exposure, and model quality. Inadequate data governance creates liability you may not discover until litigation or regulatory investigation.

Training Data Assessment

Critical Questions on Training Data

1. What data sources were used for training?

Public datasets? Licensed commercial data? Customer data? Scraped web content? Each carries different legal and quality implications.

2. What is the legal basis for using this data?

Under GDPR, AI training requires valid legal basis—typically consent or legitimate interest. Scraped data without consent creates regulatory risk.

3. Does training data contain personal information?

If yes, how was it de-identified? Can models leak training data through memorization? (Demonstrated risk with LLMs.)

4. What is the demographic composition of training data?

Underrepresentation of demographic groups leads to biased models. Vendor should provide demographic breakdowns.

5. How old is the training data?

Stale training data produces models that don't reflect current patterns. Request data recency and refresh schedules.

Customer Data Handling

Understand exactly how the vendor uses your data:

Bias & Fairness Testing

The $2.2M SafeRent settlement demonstrates that algorithmic bias creates real legal liability. Due diligence must verify the vendor has tested for and mitigated discriminatory impacts.[8]

Required Bias Assessments

Request evidence of fairness testing across protected categories relevant to your use case:

Common Fairness Metrics

Fairness Metric Definition When to Use
Demographic Parity Equal positive prediction rates across groups When groups should receive outcomes at equal rates (e.g., ad delivery)
Equal Opportunity Equal true positive rates across groups When cost of false negatives varies by group (e.g., disease screening)
Equalized Odds Equal true positive and false positive rates When both error types matter (e.g., credit decisions)
Calibration Predicted probabilities match true outcome rates When probability scores are used directly (e.g., risk scoring)

Critical point: Different fairness definitions can be mathematically incompatible. The vendor must explain which fairness criteria they optimize for and why that choice is appropriate for your use case.[9]

Bias Mitigation Documentation

Testing for bias is insufficient—vendors must demonstrate mitigation. Request documentation of:

Regulatory Compliance

AI regulation is rapidly transitioning from voluntary to mandatory. Vendor compliance directly impacts your regulatory obligations, particularly under the EU AI Act's supply chain provisions.

EU AI Act Vendor Requirements

Article 26 of the EU AI Act places explicit due diligence obligations on deployers of high-risk AI systems. If you deploy vendor AI systems classified as high-risk, you must verify:[10]

EU AI Act High-Risk System Verification Requirements

  • Vendor has conducted conformity assessment (self-assessment or notified body)
  • CE marking is affixed and accompanied by EU declaration of conformity
  • Technical documentation is available (Article 11)
  • System has undergone appropriate risk management procedures (Article 9)
  • Data governance meets Article 10 requirements (training/validation/testing data)
  • Human oversight provisions are documented (Article 14)
  • Accuracy, robustness, and cybersecurity requirements are met (Article 15)

Non-compliance penalty: Up to €35M or 7% of global annual revenue—the highest penalty tier. Enforcement begins August 2026 for high-risk systems.[10]

Other Regulatory Frameworks

US State Laws

Colorado AI Act (June 2026): Requires deployers to implement reasonable care to avoid algorithmic discrimination. Vendors must provide documentation sufficient for you to conduct impact assessments.

California ADMT (January 2027): Requires risk assessments, pre-use notices, and opt-out mechanisms for automated decision-making technology in employment, credit, housing, and education contexts.

Sector-Specific Regulations

Healthcare (FDA, HIPAA): AI medical devices require FDA clearance. HIPAA requires Business Associate Agreements and prohibits unauthorized use of PHI for AI training.

Financial Services (SR 11-7, ECOA): Model Risk Management guidance requires validation of third-party models. Equal Credit Opportunity Act requires adverse action notices with specific reasons for credit denials.

Business Continuity & Operational Risk

AI vendor failures create unique business continuity challenges. Model performance can degrade silently, vendors can deprecate API versions without notice, and model weights cannot be easily migrated to alternative providers.

Service Level Agreements

Negotiate specific SLAs covering AI-relevant metrics:

  • Uptime guarantees: 99.9% uptime minimum for production systems. Ensure SLA covers inference API availability, not just website uptime.
  • Latency commitments: P95 and P99 latency targets for inference requests. Critical for real-time applications.
  • Model performance baselines: Minimum accuracy, precision, recall thresholds. What happens if production performance falls below these levels?
  • Version stability: Minimum notice period (90 days recommended) before deprecating API versions or model endpoints.

Model Versioning & Change Management

AI models change over time through retraining, fine-tuning, and updates. Uncontrolled model changes can break production systems or introduce new biases.

Exit Strategy & Data Portability

AI vendor lock-in is severe. Unlike traditional SaaS where you can export data and migrate to competitors, AI models trained on your data or fine-tuned for your use case cannot be easily replicated.

Negotiate exit protections:

Contractual Considerations

Standard vendor contracts are insufficient for AI systems. Negotiate AI-specific protections covering liability, intellectual property, audit rights, and indemnification.

Liability & Risk Allocation

Key Contractual Provisions

1. Performance Warranties

Vendor should warrant that the AI system performs as documented in technical specifications. Include specific accuracy, bias, and reliability thresholds. Avoid vague "commercially reasonable efforts" language.

2. Liability Caps & Carve-Outs

Standard contracts cap liability at 12 months fees. For high-risk AI, negotiate higher caps or carve-outs for: gross negligence, data breaches, IP infringement, and regulatory violations. Consider requiring vendor to maintain AI-specific errors & omissions insurance.

3. Indemnification

Vendor should indemnify you for: IP infringement claims (training data copyright violations), regulatory penalties arising from vendor non-compliance, and discrimination claims resulting from documented model bias vendor failed to disclose.

4. Audit Rights

Right to audit vendor's compliance with contractual obligations including security controls, bias testing, and data handling practices. Right to request third-party audits for high-risk deployments. EU AI Act Article 26 may require this.

5. Intellectual Property Ownership

Clarify ownership of: your input data, vendor's base models, fine-tuned models trained on your data, outputs generated by the system. Default SaaS contracts often grant vendor broad rights to your data—modify this for AI systems.

Data Processing Agreements

If the vendor processes personal data (GDPR) or protected health information (HIPAA), standard Data Processing Agreements (DPAs) must be extended to cover AI-specific processing:

Ongoing Monitoring & Governance

Due diligence doesn't end at contract signing. AI systems require continuous monitoring because performance degrades over time and new risks emerge post-deployment.

Performance Monitoring Requirements

Negotiate access to monitoring dashboards tracking:

Operational Metrics

  • API uptime and latency (P50, P95, P99)
  • Request success/failure rates
  • Rate limit consumption
  • Error codes and frequencies

Model Performance Metrics

  • Accuracy, precision, recall over time
  • Confidence score distributions
  • Prediction distribution drift
  • Model performance by demographic group

Incident Response & Escalation

Define clear incident response procedures for AI-specific failures:

Quarterly Business Reviews

Establish quarterly business reviews with the vendor covering:

Complete Due Diligence Checklist

Use this comprehensive checklist to systematically assess AI vendors. Prioritize items based on risk classification of your use case.

GLACIS Framework

Evidence-Based Vendor Assessment

Traditional vendor questionnaires rely on vendor self-attestation. GLACIS provides cryptographic evidence that vendor controls actually execute—not just documentation claiming they exist.

From Questionnaires to Evidence

Instead of asking "Do you perform bias testing?", request cryptographic attestations proving bias tests executed on specific model versions with verifiable results. The GLACIS Evidence Pack generates tamper-evident proof that third parties (customers, regulators, auditors) can independently verify.

Supply Chain Transparency

EU AI Act Article 26 requires deployers to verify vendor compliance. Vendor-supplied documentation is insufficient—you need independently verifiable evidence. GLACIS attestations provide the proof deployers need to demonstrate regulatory due diligence.

Request from vendors: Not PDFs claiming compliance, but cryptographic attestations you can verify independently. If vendors can't provide verifiable evidence, they can't prove their controls work.

Security & Infrastructure (10 items)

# Requirement Evidence Type Priority
1 SOC 2 Type II report (less than 12 months old) Audit report Critical
2 ISO 27001 certification (scope includes AI operations) Certificate High
3 Annual penetration testing including AI-specific attack vectors Test summary Critical
4 Encryption at rest (AES-256) for training data and model weights Technical spec Critical
5 Encryption in transit (TLS 1.3) for all API communications Technical spec Critical
6 Multi-factor authentication enforced for all administrative access Policy doc High
7 Vulnerability management program with defined SLAs for patching Process doc High
8 Incident response plan including AI-specific incident types IR plan High
9 Business continuity and disaster recovery plan with tested procedures BC/DR plan High
10 Security awareness training for personnel with access to models/data Training records Medium

Model Transparency & Explainability (8 items)

# Requirement Evidence Type Priority
11 Model card documenting architecture, performance, limitations Model card Critical
12 Technical documentation covering model development lifecycle Technical docs High
13 Explainability methods for individual predictions (SHAP, LIME, etc.) Demo + docs Critical
14 Performance benchmarks on standard datasets relevant to use case Benchmark results High
15 Known limitations and failure modes documented Limitation docs High
16 Out-of-scope use cases explicitly identified Use case doc High
17 Model versioning strategy with semantic versioning Versioning policy Medium
18 Change log documenting model updates and performance impacts Change log Medium

Data Practices (9 items)

# Requirement Evidence Type Priority
19 Datasheet for training datasets documenting composition and provenance Datasheet Critical
20 Legal basis for using training data (licenses, consent, etc.) Legal analysis Critical
21 Customer data usage policy (training prohibition unless consented) Policy + contract Critical
22 Data residency and cross-border transfer documentation Data flow diagram High
23 Data retention policy with defined retention periods Retention policy High
24 Data deletion procedures supporting right to erasure (GDPR Article 17) Deletion process High
25 Data Processing Agreement (DPA) covering AI-specific processing DPA Critical
26 De-identification methods for personal data in training sets Technical docs High
27 Data access controls limiting personnel access to training data Access policy High

Bias & Fairness (7 items)

# Requirement Evidence Type Priority
28 Bias assessment across protected categories (race, gender, age, etc.) Bias report Critical
29 Fairness metrics tracked (demographic parity, equal opportunity, etc.) Fairness metrics Critical
30 Bias mitigation techniques documented and implemented Mitigation docs High
31 Demographic composition of training data documented Data composition High
32 Ongoing fairness monitoring in production deployments Monitoring dashboard High
33 Disparate impact testing for high-stakes decisions Impact analysis Critical
34 Third-party fairness audit (for high-risk use cases) Audit report Medium

Regulatory Compliance (8 items)

# Requirement Evidence Type Priority
35 EU AI Act risk classification and compliance roadmap (if applicable) Classification + roadmap Critical
36 Technical documentation per EU AI Act Article 11 (for high-risk) Technical docs Critical
37 Quality management system documentation (EU AI Act Article 17) QMS docs High
38 GDPR compliance documentation (DPIAs for automated decisions) DPIA Critical
39 Sector-specific compliance (HIPAA, FDA, FCA, etc. as applicable) Certifications High
40 ISO 42001 certification or roadmap (AI management system) Certificate Medium
41 NIST AI RMF alignment documentation Framework mapping Medium
42 Regulatory monitoring process for emerging AI regulations Process doc Medium

Business Continuity & Operations (8 items)

# Requirement Evidence Type Priority
43 SLA with 99.9%+ uptime guarantee for production inference APIs SLA document Critical
44 Latency commitments (P95, P99) appropriate for use case SLA document High
45 Model performance baseline guarantees (accuracy, precision, recall) Performance SLA Critical
46 Version pinning capability with controlled upgrade path Technical spec High
47 Minimum 30-day notice for model changes or deprecations Contract term High
48 Data export and portability specifications Export specs High
49 Transition assistance (90+ days) in exit scenarios Contract term Medium
50 Financial stability assessment (funding, runway, profitability) Financial disclosure Medium

Contractual Protections (6 items)

# Requirement Evidence Type Priority
51 Performance warranties with specific accuracy/bias thresholds Contract term Critical
52 Liability caps appropriate for risk level (or uncapped for critical risks) Contract term Critical
53 Indemnification for IP infringement, regulatory penalties, discrimination Contract term Critical
54 Audit rights (annual or on-demand for high-risk use cases) Contract term High
55 Clear IP ownership terms (inputs, outputs, fine-tuned models) Contract term High
56 Insurance coverage (E&O, cyber) with AI-specific coverage confirmed COI + policy Medium

Frequently Asked Questions

How do I prioritize vendors when I have limited assessment resources?

Use a risk-based approach. Classify vendors by the risk level of their AI application: high-risk (decisions affecting legal rights, safety, or core business operations), medium-risk (supporting functions with human oversight), low-risk (non-critical automation). Apply the full 56-item checklist to high-risk vendors, abbreviated assessment to medium-risk, and standard IT due diligence to low-risk.

What if the vendor refuses to provide technical documentation?

This is a red flag. For high-risk use cases, lack of transparency should be disqualifying. Vendors claiming "proprietary" as justification for withholding model cards, bias assessments, or security testing results cannot support your regulatory obligations under the EU AI Act or demonstrate they've addressed discrimination risk. Consider alternative vendors or reduce the risk classification of the deployment (add human oversight, limit scope).

Should I require third-party audits of AI vendors?

For high-risk applications, yes. Independent third-party audits provide validation that vendor claims are accurate. Request annual SOC 2 Type II audits at minimum. For high-stakes decisions (hiring, lending, medical), request independent bias audits. Budget 2-4 weeks for audit report review and vendor Q&A.

How often should I reassess AI vendors?

Annual comprehensive reassessment minimum. Quarterly business reviews should track ongoing performance, security, and bias metrics. Trigger immediate reassessment if: major model updates occur, security incidents are disclosed, regulatory requirements change, or performance degrades below SLA thresholds.

References

  1. [1] Industry analysis: 65% of enterprises require SOC 2 for software vendors, but only 35% of AI vendors hold certifications (2024).
  2. [2] Stanford HAI. "AI Index Report 2025." Only 11% of organizations have fully implemented responsible AI capabilities. hai.stanford.edu
  3. [3] Third-party risk management industry estimates for comprehensive AI vendor assessments (2024).
  4. [4] Thomson Reuters. "Legal Professional AI Survey 2024." 68% cite hallucinations as top concern.
  5. [5] Industry analysis on AI hallucination economic impact, 2024.
  6. [6] Texas Attorney General. "Pieces Technologies Settlement." September 2024. First enforcement action against misleading AI performance claims.
  7. [7] OWASP Foundation. "OWASP Top 10 for Large Language Model Applications." owasp.org
  8. [8] SafeRent Solutions Settlement. November 2024. $2.2M settlement for algorithmic discrimination in tenant screening.
  9. [9] Kleinberg, Jon, et al. "Inherent Trade-Offs in the Fair Determination of Risk Scores." ITCS 2017. Demonstrates mathematical impossibility of satisfying all fairness definitions simultaneously.
  10. [10] European Union. "Regulation (EU) 2024/1689 - Artificial Intelligence Act." Official Journal of the EU, July 2024. eur-lex.europa.eu
  11. [11] Colorado Department of Law. "Colorado Artificial Intelligence Act" (SB24-205). Effective February 2026.
  12. [12] California Legislature. "Automated Decision Making Technology Act" (AB 2930). Effective January 2027.
  13. [13] U.S. Food and Drug Administration. "Artificial Intelligence and Machine Learning in Software as a Medical Device." fda.gov
  14. [14] Federal Reserve. "SR 11-7: Guidance on Model Risk Management." April 2011. Applies to third-party models used by financial institutions.
  15. [15] Equal Credit Opportunity Act (ECOA), 15 U.S.C. § 1691. Requires specific reasons for adverse credit decisions.
  16. [16] NIST. "AI Risk Management Framework (AI RMF 1.0)." January 2023. nist.gov
  17. [17] ISO/IEC. "ISO/IEC 42001:2023 - Information technology — Artificial intelligence — Management system." iso.org
  18. [18] Mitchell, Margaret, et al. "Model Cards for Model Reporting." FAT* 2019. Proposes standardized documentation for machine learning models.
  19. [19] Gebru, Timnit, et al. "Datasheets for Datasets." Communications of the ACM, 2021. Framework for documenting datasets used in ML.
  20. [20] Lundberg, Scott M., and Su-In Lee. "A Unified Approach to Interpreting Model Predictions." NIPS 2017. SHAP (SHapley Additive exPlanations) framework.

Verify Your AI Vendors with Evidence

Don't rely on vendor questionnaires. GLACIS generates cryptographic attestations proving vendor controls execute correctly—the evidence you need to satisfy EU AI Act Article 26 due diligence obligations.

Request Vendor Assessment

Related Guides