← Back to Library

MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

Authors: Dongsen Zhang, Zekun Li, Xu Luo, Xuannan Liu, Peipei Li, Wenjun Xu

Published: 2025-10-14

arXiv ID: 2510.15994v1

Added to Library: 2025-11-14 23:12 UTC

Red Teaming

📄 Abstract

The Model Context Protocol (MCP) standardizes how large language model (LLM) agents discover, describe, and call external tools. While MCP unlocks broad interoperability, it also enlarges the attack surface by making tools first-class, composable objects with natural-language metadata, and standardized I/O. We present MSB (MCP Security Benchmark), the first end-to-end evaluation suite that systematically measures how well LLM agents resist MCP-specific attacks throughout the full tool-use pipeline: task planning, tool invocation, and response handling. MSB contributes: (1) a taxonomy of 12 attacks including name-collision, preference manipulation, prompt injections embedded in tool descriptions, out-of-scope parameter requests, user-impersonating responses, false-error escalation, tool-transfer, retrieval injection, and mixed attacks; (2) an evaluation harness that executes attacks by running real tools (both benign and malicious) via MCP rather than simulation; and (3) a robustness metric that quantifies the trade-off between security and performance: Net Resilient Performance (NRP). We evaluate nine popular LLM agents across 10 domains and 400+ tools, producing 2,000 attack instances. Results reveal the effectiveness of attacks against each stage of MCP. Models with stronger performance are more vulnerable to attacks due to their outstanding tool calling and instruction following capabilities. MSB provides a practical baseline for researchers and practitioners to study, compare, and harden MCP agents.

🔍 Key Points

  • Introduction of MSB (MCP Security Benchmark), an evaluation suite for assessing the security of LLM agents using the Model Context Protocol (MCP) across multiple attack scenarios.
  • Development of a comprehensive taxonomy of 12 attack types, covering various stages of the MCP workflow including planning, invocation, and response handling.
  • Implementation of a dynamic evaluation framework which uses real tools (both benign and malicious) to expose security vulnerabilities more effectively compared to static benchmarks.
  • Presentation of Net Resilient Performance (NRP) as a new robustness metric that quantifies the trade-off between security and performance.
  • Findings indicate that more performant LLMs are more vulnerable to certain attacks, emphasizing the need for security considerations in model design.

💡 Why This Paper Matters

This paper is a crucial contribution to the field of AI security, providing new tools and methodologies for evaluating the risks associated with LLM agents. Its comprehensive approach to identifying and benchmarking vulnerabilities within the Model Context Protocol places it at the forefront of research on agent security, ensuring that future developments in AI can be made with a better understanding of potential exploitation threats.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper highly relevant as it addresses a growing concern about the vulnerability of LLM agents in real-world applications, particularly in their interactions with external tools. With its novel benchmarking methodology and significant findings about the relationship between model performance and security, this work provides a foundational framework for future research aimed at enhancing the security posture of AI applications.

📚 Read the Full Paper