ICX360: In-Context eXplainability 360 Toolkit

Authors: Dennis Wei, Ronny Luss, Xiaomeng Hu, Lucas Monteiro Paes, Pin-Yu Chen, Karthikeyan Natesan Ramamurthy, Erik Miehling, Inge Vejsbjerg, Hendrik Strobelt

Published: 2025-11-14

arXiv ID: 2511.10879v1

Added to Library: 2025-11-17 03:01 UTC

Red Teaming

📄 Abstract

Large Language Models (LLMs) have become ubiquitous in everyday life and are entering higher-stakes applications ranging from summarizing meeting transcripts to answering doctors' questions. As was the case with earlier predictive models, it is crucial that we develop tools for explaining the output of LLMs, be it a summary, list, response to a question, etc. With these needs in mind, we introduce In-Context Explainability 360 (ICX360), an open-source Python toolkit for explaining LLMs with a focus on the user-provided context (or prompts in general) that are fed to the LLMs. ICX360 contains implementations for three recent tools that explain LLMs using both black-box and white-box methods (via perturbations and gradients respectively). The toolkit, available at https://github.com/IBM/ICX360, contains quick-start guidance materials as well as detailed tutorials covering use cases such as retrieval augmented generation, natural language generation, and jailbreaking.

🔍 Key Points

Introduction of ICX360, an open-source Python toolkit designed for explaining outputs of large language models (LLMs) based on their inputs or user-provided context.
Inclusion of three specific methods for explanation: MExGen (perturbation-based), CELL (contrastive explanations), and Token Highlighter (gradient-based), catering to different needs for explainability in LLMs.
Development of a structured framework for in-context explainability, enabling categorization of methods based on model access level, input granularity, and output granularity, providing a comprehensive landscape of current techniques.
Comparison and contrast with existing explainability methods (SHAP, Captum, Inseq), showcasing the enhanced interpretability and efficiency of the ICX360 toolkit against limitations of these existing tools.
Provision of tutorials and quick-start guidance for practical application, promoting accessibility for a wider range of users and practitioners.

💡 Why This Paper Matters

The ICX360 toolkit represents a significant advancement in the field of explainability for large language models, providing structured and accessible methods for understanding AI outputs in high-stakes applications. By focusing on the relationship between input context and output responses, it addresses a critical need in AI that enhances transparency, trust, and accountability in automated systems. This toolkit not only contributes to the academic discourse on AI interpretability but also has practical implications for industries relying on LLMs, such as healthcare and finance.

🎯 Why It's Interesting for AI Security Researchers

This paper would particularly interest AI security researchers due to its focus on explainability in high-stakes systems where LLMs are applied. Understanding the rationale behind AI outputs is crucial for identifying vulnerabilities, biases, and potentially harmful outcomes in automated decision-making processes. Enhanced transparency through tools like ICX360 could enable researchers to rigorously assess and mitigate risks associated with LLM use in sensitive contexts, ultimately fostering safer and more reliable AI applications.

ICX360: In-Context eXplainability 360 Toolkit

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper