What's the difference between explainability and interpretability?

Interpretability refers to how understandable a model's internal mechanics are—linear regression is interpretable because you can see the coefficients directly. Explainability is broader: it's the ability to describe why a model made a specific decision, even if the model itself is a black box. You can have explainability without interpretability (e.g., using SHAP to explain a neural network).

Does GDPR require AI explainability?

Yes. GDPR Article 22 gives individuals the right not to be subject to decisions based solely on automated processing that produce legal or similarly significant effects. Article 13-15 require information about the logic involved in automated decision-making. While the regulation doesn't mandate specific techniques, organizations must provide meaningful explanations of algorithmic decisions.

What are the most common XAI techniques?

The most widely adopted techniques are SHAP (SHapley Additive exPlanations) for feature importance, LIME (Local Interpretable Model-agnostic Explanations) for local explanations, attention mechanisms for deep learning, counterfactual explanations showing what would change a decision, and feature importance rankings. SHAP and LIME are model-agnostic and work across different AI architectures.

AI Explainability & Transparency Guide 2025

What is AI Explainability?

AI explainability (XAI) is the ability to understand and articulate how an AI system produces its outputs. At its core, explainability enables humans to comprehend, trust, and effectively manage AI decision-making.^[4]

A system is explainable when you can answer questions like:

Which inputs influenced this decision? What features or data points drove the prediction?
How were inputs weighted? Which factors mattered most, and which were ignored?
Why this decision versus alternatives? What would need to change for a different outcome?
Is the logic consistent and fair? Does the model apply the same reasoning across similar cases?

Explainability vs. Interpretability vs. Transparency

These terms are often conflated but represent distinct concepts:

Key Definitions

Concept	Definition	Example
Interpretability	The degree to which a human can understand the model's internal mechanics	Linear regression coefficients, decision tree paths
Explainability	The ability to describe why a model made a specific decision	SHAP values explaining a neural network prediction
Transparency	Openness about the AI system's design, data, training, and limitations	Model cards, system documentation, training data provenance

Key distinction: You can have explainability without interpretability. A deep neural network is not interpretable—you cannot inspect its millions of parameters and understand its logic. But you can use techniques like SHAP or LIME to generate explanations for specific predictions, making the model explainable even though it remains fundamentally uninterpretable.^[5]

The Four Levels of Explainability

Models fall into four categories based on how explainability is achieved:^[4]

Inherently Interpretable

Models whose internal structure is human-understandable.

Examples: Linear regression, logistic regression, decision trees (small), rule-based systems

Post-Hoc Model-Specific

Techniques that leverage specific model architectures to generate explanations.

Examples: Attention mechanisms in transformers, saliency maps for CNNs, layer-wise relevance propagation

Post-Hoc Model-Agnostic

Techniques that work regardless of model type by treating the model as a black box.

Examples: SHAP, LIME, counterfactual explanations, partial dependence plots, feature permutation importance

Example-Based

Explanations provided through representative training examples or prototypes.

Examples: K-nearest neighbors justifications, influential instances, prototype learning

Why Explainability Matters

Explainability has evolved from academic curiosity to business imperative. Four forces are driving this shift:

1. Trust and User Adoption

85% of consumers want to understand when and how AI affects them.^[1] In high-stakes domains like healthcare and finance, lack of explainability directly inhibits adoption. A physician won't act on a cancer detection model that can't articulate why it flagged a case. A loan officer won't trust a credit decision without understanding the factors driving the recommendation.

Research shows that appropriate explanations increase user trust and model adoption—but only when the explanations are accurate and meaningful. Oversimplified or misleading explanations can backfire, creating false confidence in unreliable systems.^[6]

2. Regulatory Compliance

Explainability is transitioning from best practice to legal requirement:

GDPR Article 22 (2018): Right not to be subject to solely automated decision-making with legal or similarly significant effects. Articles 13-15 require "meaningful information about the logic involved."^[3]
EU AI Act Article 13 (2026): High-risk AI systems must be "designed and developed in such a way to ensure that their operation is sufficiently transparent to enable users to interpret the system's output and use it appropriately."^[7]
Colorado AI Act (June 2026): Requires deployers to provide statements disclosing the purpose, nature, and intended use of high-risk AI systems, including principal data inputs and how they inform outputs.^[8]

3. Model Debugging and Improvement

Explainability tools reveal when models learn spurious correlations or fail to generalize. Classic examples include:

The Husky vs. Wolf Problem

A deep learning model trained to distinguish huskies from wolves achieved high accuracy in testing but failed in production. Saliency map analysis revealed the model was using snow in the background—not animal features—as the primary decision factor. Huskies in the training set were often photographed in snow; wolves in different environments. The model learned the wrong pattern.^[9]

Without explainability techniques, such failures remain hidden until production deployment—when they're costliest to fix.

4. Fairness and Bias Detection

Explainability enables detection of discriminatory patterns. The COMPAS recidivism algorithm, used to inform bail and sentencing decisions, was found to exhibit racial bias partly through explainability analysis showing race-correlated features (zip code, prior arrests) disproportionately influenced risk scores for Black defendants.^[10]

ProPublica's investigation revealed COMPAS was twice as likely to incorrectly label Black defendants as high-risk compared to white defendants (45% vs. 23% false positive rate). Explainability techniques made the bias measurable and actionable.^[10]

Types of Explanations

Not all explanations serve the same purpose. Understanding the different types helps you choose appropriate techniques for your use case.

Global vs. Local Explanations

Global Explanations

Describe overall model behavior across all predictions. Answer: "How does this model generally work?"

Techniques:

Feature importance rankings
Partial dependence plots
Global SHAP summaries

Local Explanations

Describe why a specific prediction was made. Answer: "Why did the model produce this output for this input?"

Techniques:

LIME (local approximation)
Individual SHAP values
Counterfactual explanations

Use global explanations for model validation, bias assessment, and understanding system-wide behavior. Use local explanations for regulatory compliance, individual decision justification, and debugging specific failures.

Model-Agnostic vs. Model-Specific

Approach	Advantages	Disadvantages
Model-Agnostic (SHAP, LIME, PDP)	Works across any model type; flexible; enables comparison across models	May miss model-specific insights; computationally expensive; approximations
Model-Specific (Attention, saliency maps)	Leverages model architecture; often faster; can be more precise	Only works for specific model types; not comparable across architectures

Explainability Techniques

The XAI toolkit has matured significantly since DARPA's XAI program launched in 2016. Here are the most widely adopted techniques:^[11]

SHAP (SHapley Additive exPlanations)

SHAP, introduced by Lundberg and Lee in 2017, uses game theory (Shapley values) to assign each feature an importance value for a particular prediction. SHAP has become the de facto standard for feature importance explanations.^[12]

How SHAP Works

SHAP calculates the marginal contribution of each feature by comparing predictions with and without that feature across all possible feature combinations. The Shapley value is the average marginal contribution across all permutations.

Key Properties:

Additivity: SHAP values sum to the difference between the prediction and the baseline
Consistency: If a feature contributes more, its SHAP value never decreases
Accuracy: Local explanations match the model's actual output

When to use SHAP: Feature importance for individual predictions, model debugging, identifying bias sources, explaining tree-based models (TreeSHAP is particularly fast), regulatory compliance documentation.

Limitations: Computationally expensive for large models, assumes feature independence (can be problematic with correlated features), requires careful baseline selection.

LIME (Local Interpretable Model-agnostic Explanations)

LIME, developed by Ribeiro et al. in 2016, explains individual predictions by fitting a simple interpretable model (like linear regression) locally around the prediction being explained.^[13]

How LIME Works

LIME perturbs the input (e.g., by randomly removing words from text or masking image regions), observes how predictions change, then fits a linear model to those local perturbations. The linear model's coefficients become the explanation.

Advantages:

Works for text, images, tabular data
Fast compared to SHAP
Produces human-friendly linear explanations

When to use LIME: Quick explanations for debugging, text classification explanations, image classification with superpixel highlighting, situations where speed matters more than theoretical guarantees.

Limitations: Explanations can be unstable (small input changes produce different explanations), no theoretical guarantees like SHAP, requires careful tuning of locality parameters.

Attention Mechanisms

Attention mechanisms, central to transformer architectures like BERT and GPT, provide built-in explainability by showing which input tokens the model "attended to" when producing each output token.^[14]

When to use attention: Natural language processing tasks, machine translation explanations, document summarization justification, any transformer-based model where you need to show which inputs influenced outputs.

Limitations: Attention weights don't always correspond to true feature importance, multi-head attention can be difficult to interpret, attention is correlational not causal.

Counterfactual Explanations

Counterfactual explanations answer: "What would need to change for the model to produce a different output?" They're particularly valuable for regulatory compliance and user-facing explanations.^[15]

Example: Loan Denial Explanation

"Your loan application was denied. If your annual income were $52,000 instead of $48,000 and your credit utilization were 25% instead of 68%, you would have been approved."

When to use counterfactuals: Credit decisions (GDPR/ECOA compliance), hiring/admission decisions, medical diagnosis ("what symptoms would change the diagnosis?"), any domain where users need actionable feedback.

Limitations: May suggest unrealistic or unethical changes, can be expensive to compute, not always unique (multiple counterfactuals may exist).

Feature Importance and Permutation Importance

Feature importance techniques measure how much each feature contributes to model performance. Permutation importance shuffles feature values and measures the drop in model accuracy—features causing large drops are deemed important.^[16]

When to use feature importance: Model selection and comparison, feature engineering validation, compliance documentation showing what data the model uses, communicating with non-technical stakeholders.

Transparency Requirements

Beyond explainability techniques, transparency requires documentation that makes AI systems understandable at the system level. Two frameworks have emerged as standards:

Model Cards

Introduced by Mitchell et al. (Google) in 2019, model cards provide structured documentation of ML models including intended use, training data, performance across demographics, ethical considerations, and known limitations.^[17]

Model Card Contents

Model Details: Version, architecture, training date, developers
Intended Use: Primary use cases, out-of-scope applications
Training Data: Sources, size, preprocessing, demographics
Evaluation: Metrics, test datasets, performance breakdowns by demographic
Ethical Considerations: Bias analysis, fairness metrics, sensitive use cases
Caveats and Recommendations: Known limitations, recommended mitigations

Major AI providers now publish model cards: OpenAI (GPT-5.2), Anthropic (Claude Opus 4.5), Google (Gemini 3), Meta (Llama 4). The EU AI Act's transparency requirements effectively mandate model card-style documentation for high-risk systems.^[7]

System Cards and Documentation

System cards extend model cards to describe complete AI systems including data pipelines, human-in-the-loop components, deployment infrastructure, monitoring procedures, and incident response plans. ISO 42001 and the EU AI Act require system-level documentation beyond individual model cards.^[18]

Regulatory Requirements

Explainability and transparency have shifted from optional features to regulatory mandates. Here's what's required and when:

GDPR Article 22 and the Right to Explanation

GDPR Article 22 (effective May 2018) establishes that individuals have the right not to be subject to decisions based solely on automated processing that produce legal or similarly significant effects. Articles 13-15 require controllers to provide "meaningful information about the logic involved" in automated decision-making.^[3]

What “Meaningful Information” Means

GDPR doesn't specify technical requirements, but ICO and EDPB guidance clarifies organizations must provide:

Information about the types of data used in the decision
Why those data points are considered relevant
The source of the data
How the data creates a particular result for the individual

Enforcement: GDPR violations can result in fines up to €20 million or 4% of global annual revenue. Several GDPR complaints have centered on Article 22, including challenges to automated credit scoring and profiling systems.^[3]

EU AI Act Article 13: Transparency and Information to Deployers

The EU AI Act (enforcement begins August 2026 for high-risk systems) requires that high-risk AI systems be designed to ensure "sufficiently transparent" operation enabling users to "interpret the system's output and use it appropriately."^[7]

EU AI Act Transparency Requirements

Requirement	Article	Applies To
Technical documentation describing system design, data, and operation	Article 11	High-risk AI systems
Transparency to enable interpretation of outputs and appropriate use	Article 13	High-risk AI systems
Automatic logging of events during operation	Article 12	High-risk AI systems
Information that system is interacting with AI	Article 52(1)	AI systems interacting with humans

Penalties: Non-compliance with high-risk requirements can result in fines up to €35 million or 7% of global annual revenue—significantly higher than GDPR.^[7]

Colorado Artificial Intelligence Act

Colorado's AI Act (effective June 2026) requires deployers of high-risk AI systems to provide consumers with clear statements disclosing:^[8]

The purpose, nature, and intended use of the high-risk AI system
The principal data processed and how it informs the AI system's outputs
The known limitations of the AI system
How consumers can opt out or appeal algorithmic decisions

Deployers must also conduct impact assessments that include transparency documentation. Unlike the EU AI Act, Colorado's law includes a safe harbor provision: deployers who comply in good faith receive liability protection.^[8]

Explainability by Use Case

Different domains have different explainability requirements driven by regulatory context, risk levels, and user needs.

Healthcare and Clinical AI

Healthcare represents the highest-stakes domain for explainability. Physicians need to understand AI recommendations to maintain clinical accountability, and regulators require transparency for medical device approval.

Healthcare Explainability Requirements

FDA 510(k) submissions: Medical AI devices require documentation of how the algorithm works, training data characteristics, and validation evidence
EU MDR Class IIa+: Medical devices using AI require notified body assessment including transparency documentation (by August 2027 under AI Act)
Clinical acceptance: Physicians expect to see which imaging features, lab values, or patient factors drove a diagnostic recommendation

Common techniques: Saliency maps for medical imaging (showing which pixels influenced a diagnosis), SHAP for predictive models (identifying risk factors), attention visualizations for clinical notes, counterfactuals for treatment recommendations.

Example: Google's diabetic retinopathy detection system uses attention maps to highlight specific retinal regions (hemorrhages, exudates) that drove the diagnosis, enabling ophthalmologists to validate the AI's reasoning.^[19]

Financial Services and Credit Decisions

Financial services face the most mature explainability requirements due to decades of anti-discrimination regulation.

Financial Services Explainability Requirements

Equal Credit Opportunity Act (ECOA): Adverse action notices must state specific reasons for credit denials
Fair Credit Reporting Act (FCRA): Requires disclosure of factors that adversely affected credit scores
GDPR Article 22: Right to explanation for automated credit decisions in the EU
Model Risk Management (SR 11-7): Federal Reserve guidance requiring documentation and validation of model logic

Common techniques: Counterfactual explanations ("if your income were $X higher, you would qualify"), feature importance rankings (showing top factors in credit decisions), adverse action reason codes, SHAP values for individual decisions.

Challenge: Balancing explainability with model performance. Simpler, more interpretable models may have lower predictive accuracy. Many banks use "challenger models"—interpretable models that validate complex model decisions.^[20]

HR and Employment Decisions

AI-powered hiring and HR systems face increasing scrutiny following discrimination lawsuits and regulatory investigations.

Amazon Recruiting Tool Bias (2018)

Amazon developed an AI recruiting tool trained on historical résumés. Explainability analysis revealed the model penalized résumés containing the word "women's" (as in "women's chess club") because the training data—predominantly male hires—lacked these terms. The company scrapped the system after discovering it had learned gender bias from historical hiring patterns.^[21]

Regulatory context: The EU AI Act classifies employment AI as high-risk. NYC Local Law 144 (2023) requires bias audits for automated employment decision tools. Illinois Artificial Intelligence Video Interview Act requires notice and consent for AI-analyzed video interviews.^[22]

Common techniques: Feature importance to identify resume/assessment factors, bias testing across demographic groups, counterfactuals for candidate feedback, LIME for explaining why specific candidates were recommended or rejected.

Legal Services and Contract Analysis

Legal AI faces unique explainability requirements because attorneys maintain professional liability for work product—including AI-assisted analysis.

Thomson Reuters found 68% of legal professionals cite hallucinations as their top AI concern, with over 40% reporting LLM drafts requiring full manual revision.^[23] Explainability helps lawyers validate AI-generated legal research, contract analysis, and document review.

Common techniques: Citation tracing (showing source documents), attention visualization (highlighting contract clauses that drove extraction results), confidence scoring with explanations for low-confidence outputs, version tracking showing how AI suggestions evolved.

Implementation Framework

Building explainable AI systems requires intentional design, not post-deployment retrofitting. Here's a practical framework:

Phase 1: Requirements Definition

Key Questions to Answer

Who needs explanations? End users, auditors, regulators, internal teams? Each has different needs.
What regulatory requirements apply? GDPR Article 22, EU AI Act Article 13, Colorado AI Act, ECOA, sector-specific rules?
What type of explanation is needed? Global model behavior, individual decision justification, counterfactuals, feature importance?
What's the acceptable performance trade-off? Can you use an inherently interpretable model, or do you need post-hoc explanations for a complex model?
How will explanations be validated? Who confirms explanations are accurate and meaningful?

Phase 2: Technique Selection

Choose explainability techniques based on your requirements:

Use Case	Recommended Technique	Why
Credit decisions (regulatory)	Counterfactuals + SHAP	ECOA requires actionable reasons; SHAP provides detailed attribution
Medical diagnosis support	Attention maps + saliency	Physicians need to see what image regions drove diagnosis
Model debugging	SHAP + permutation importance	Reveals spurious correlations and feature dependencies
Regulatory compliance docs	Global feature importance + model cards	Demonstrates what data the model uses and how
User-facing explanations	LIME + natural language	Fast, produces simple explanations non-experts can understand

Phase 3: Validation and Testing

Explainability techniques can be wrong. Validate explanations through:

Sanity checks: Do explanations align with domain knowledge? If a medical model highlights random pixels, it's likely wrong.
Consistency testing: Do similar inputs produce similar explanations? Inconsistent explanations suggest instability.
Ablation studies: Remove features identified as important. Does performance drop as predicted?
Expert review: Have domain experts review explanations for a sample of decisions. Do they agree?

Phase 4: Documentation and Monitoring

Document your explainability approach in model cards and system documentation. Monitor explanation quality over time—model drift can cause explanation drift.

GLACIS Framework

Explainability Implementation Checklist

Define Stakeholder Needs

Map who needs explanations (regulators, users, auditors, internal teams) and what type (global, local, counterfactual). Regulatory requirements determine minimum viable explainability.

Choose Techniques Early

Select explainability techniques during model design—not after deployment. Constraints like real-time explanations or specific regulatory formats influence technique selection and model architecture.

Validate Explanation Quality

Test explanations with domain experts. Check consistency, sanity, and alignment with ground truth. Bad explanations are worse than no explanations—they create false confidence.

Document and Monitor

Create model cards documenting explainability approach, validation results, and known limitations. Monitor explanation drift as models and data evolve. Archive explanations for audit trails.

Key principle: Explainability is not a feature you add at the end. It's a design constraint that shapes model selection, feature engineering, and deployment architecture from day one.

Communicating Explanations

Technical explainability methods produce outputs like SHAP values, attention weights, and feature importance scores. Translating these into meaningful explanations for different audiences is as important as generating them.

Explanations for Technical Audiences

Data scientists and ML engineers benefit from detailed technical explanations:

SHAP value distributions and dependence plots
Feature importance rankings with confidence intervals
Partial dependence plots showing feature relationships
Model architecture diagrams and training metrics

Explanations for Non-Technical Users

End users need simple, actionable explanations in natural language:

Bad Example (Technical)

"Loan denied. SHAP values: credit_score (-0.23), dti_ratio (-0.18), age (0.05), income (0.12). Model confidence: 0.87."

Good Example (Natural Language)

"We couldn't approve your loan at this time. The main factors were your credit score (620) and debt-to-income ratio (52%). Improving your credit score to 680+ or reducing your debt-to-income ratio below 40% would significantly increase your approval chances."

Explanations for Regulators and Auditors

Compliance audiences need evidence that explanations are accurate, complete, and non-discriminatory:

Model cards with transparency documentation
Fairness metrics across demographic groups
Validation evidence showing explanation accuracy
Audit trails showing how explanations were generated and verified

Tools & Platforms

The explainability tooling ecosystem has matured significantly. Here are the leading open-source and commercial options:

Open Source Tools

SHAP

Python library for SHAP values

The most widely used XAI library with 22K+ GitHub stars. Provides TreeSHAP (optimized for tree models), DeepSHAP (for neural networks), KernelSHAP (model-agnostic), and visualization tools. Developed by Scott Lundberg at University of Washington and Microsoft Research.^[12]

Best for: Feature importance, model debugging, regulatory documentation. Works with scikit-learn, XGBoost, LightGBM, TensorFlow, PyTorch.

LIME

Local Interpretable Model-agnostic Explanations

Fast, model-agnostic explanations through local linear approximations. Supports tabular, text, and image data. Developed by Marco Tulio Ribeiro, originally at University of Washington.^[13]

Best for: Quick debugging, text classification, image classification with superpixel explanations. Faster than SHAP but less theoretically rigorous.

InterpretML

Microsoft's unified explainability toolkit

Unified API for interpretable models (GAMs, linear models, decision trees) and black-box explanations (SHAP, LIME). Includes Explainable Boosting Machines (EBMs)—interpretable models with accuracy competitive with gradient boosting.^[24]

Best for: Teams wanting both inherently interpretable models and post-hoc explanations in one framework.

Alibi

Open-source library for ML model inspection

Comprehensive XAI library from Seldon including counterfactual explanations, anchor explanations, prototypes, and integrated gradients. Strong support for TensorFlow and PyTorch models.^[25]

Best for: Counterfactual explanations, deep learning models, production ML systems.

Commercial Platforms

Enterprise AI governance platforms increasingly include built-in explainability capabilities:

IBM watsonx.governance: Model explanation dashboards, fairness metrics, automated documentation generation
Credo AI: Explainability assessment as part of AI governance workflows, alignment with regulatory requirements
Holistic AI: Bias and explainability testing integrated with compliance automation
Arize AI: Model monitoring and observability with built-in explainability for production systems

Frequently Asked Questions

Does explainability reduce model accuracy?

Not necessarily. Post-hoc explainability techniques (SHAP, LIME) don't affect model accuracy—they just explain existing predictions. However, if you choose an inherently interpretable model (like linear regression) instead of a complex model (like deep learning), you may sacrifice some accuracy. The key is understanding your acceptable trade-off. In many regulated domains, the interpretability benefit outweighs marginal accuracy gains.

Can I use SHAP and LIME together?

Yes. Many teams use LIME for fast debugging during development and SHAP for production explanations and compliance documentation. SHAP has stronger theoretical guarantees (it's the only explanation method satisfying local accuracy, missingness, and consistency), but LIME is often faster. Using both provides validation—if they produce very different explanations for the same prediction, investigate why.

How do I validate that my explanations are correct?

Use multiple validation approaches: (1) Sanity checks—do explanations align with domain knowledge? (2) Consistency testing—do similar inputs produce similar explanations? (3) Ablation studies—remove features identified as important and verify performance drops. (4) Expert review—have domain experts validate a sample of explanations. (5) Comparative analysis—compare multiple explanation techniques (SHAP vs. LIME) to check agreement.

What if my model is too complex to explain meaningfully?

This suggests you may be using the wrong model for your use case. If regulatory requirements mandate explanations (GDPR Article 22, EU AI Act Article 13) and you cannot provide meaningful explanations, you're in non-compliance. Consider: (1) Using a simpler, interpretable model as a baseline, (2) Developing a "challenger model" that validates complex model decisions with interpretable logic, (3) Limiting the complex model to low-risk use cases where explainability isn't required, (4) Investing in better explanation techniques and validation.

Are attention mechanisms in transformers sufficient for explainability?

No. While attention weights show which tokens the model focused on, research has shown attention weights don't always correspond to true feature importance and can be misleading. Attention is correlational, not causal. Use attention as one signal, but supplement with techniques like integrated gradients, SHAP, or input perturbation analysis for transformer models in high-stakes applications.

References

[1] Edelman Trust Barometer Special Report: Trust and AI (2024). 85% of consumers want to understand when and how AI affects them.
[2] Gartner. "AI and ML Model Transparency and Explainability Survey" (2024). 60% of production models lack meaningful explainability.
[3] European Union. "General Data Protection Regulation (GDPR)." Articles 13-15, 22. gdpr-info.eu
[4] Arrieta, A.B., et al. "Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI." Information Fusion 58 (2020): 82-115.
[5] Lipton, Z.C. "The mythos of model interpretability." Communications of the ACM 61.10 (2018): 36-43.
[6] Ribera, M., & Lapedriza, A. "Can we do better explanations? A proposal of user-centered explainable AI." IUI Workshops (2019).
[7] European Union. "Artificial Intelligence Act." Article 13: Transparency and provision of information to deployers. artificialintelligenceact.eu
[8] Colorado Senate Bill 24-205: "Concerning Consumer Protections in Interactions with Artificial Intelligence Systems." Effective June 1, 2026.
[9] Ribeiro, M.T., Singh, S., & Guestrin, C. "Why should I trust you?: Explaining the predictions of any classifier." KDD (2016). Demonstrates husky/wolf snow background problem.
[10] Angwin, J., et al. "Machine Bias: There's software used across the country to predict future criminals. And it's biased against blacks." ProPublica (May 2016). propublica.org
[11] DARPA. "Explainable Artificial Intelligence (XAI) Program." 2016-2021. darpa.mil
[12] Lundberg, S.M., & Lee, S.I. "A unified approach to interpreting model predictions." NeurIPS (2017). github.com/slundberg/shap
[13] Ribeiro, M.T., Singh, S., & Guestrin, C. "Why should I trust you?: Explaining the predictions of any classifier." KDD (2016). github.com/marcotcr/lime
[14] Vaswani, A., et al. "Attention is all you need." NeurIPS (2017). Introduced the transformer architecture with attention mechanisms.
[15] Wachter, S., Mittelstadt, B., & Russell, C. "Counterfactual explanations without opening the black box: Automated decisions and the GDPR." Harvard Journal of Law & Technology 31.2 (2018): 841-887.
[16] Breiman, L. "Random forests." Machine Learning 45.1 (2001): 5-32. Introduced feature importance for tree ensembles.
[17] Mitchell, M., et al. "Model cards for model reporting." FAT* (2019). arxiv.org
[18] ISO/IEC 42001:2023. "Information technology — Artificial intelligence — Management system." International standard for AI management systems.
[19] Gulshan, V., et al. "Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs." JAMA 316.22 (2016): 2402-2410.
[20] Federal Reserve. "Supervisory Guidance on Model Risk Management (SR 11-7)." April 2011. Requires validation and documentation of model logic.
[21] Dastin, J. "Amazon scraps secret AI recruiting tool that showed bias against women." Reuters (October 2018). reuters.com
[22] New York City Local Law 144 (2021). Requires bias audits for automated employment decision tools. Effective July 2023.
[23] Thomson Reuters. "Legal Professionals and Generative AI Survey" (2024). 68% cite hallucinations as top concern.
[24] Nori, H., et al. "InterpretML: A unified framework for machine learning interpretability." arXiv:1909.09223 (2019). github.com/interpretml/interpret
[25] Seldon. "Alibi: Algorithms for monitoring and explaining machine learning models." github.com/SeldonIO/alibi

Executive Summary

In This Guide