← Back to Library

Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks

Authors: Saeid Jamshidi, Kawser Wazed Nafi, Arghavan Moradi Dakhel, Negar Shahabi, Foutse Khomh, Naser Ezzati-Jivan

Published: 2025-12-06

arXiv ID: 2512.06556v1

Added to Library: 2025-12-09 03:03 UTC

Red Teaming

📄 Abstract

The Model Context Protocol (MCP) enables Large Language Models to integrate external tools through structured descriptors, increasing autonomy in decision-making, task execution, and multi-agent workflows. However, this autonomy creates a largely overlooked security gap. Existing defenses focus on prompt-injection attacks and fail to address threats embedded in tool metadata, leaving MCP-based systems exposed to semantic manipulation. This work analyzes three classes of semantic attacks on MCP-integrated systems: (1) Tool Poisoning, where adversarial instructions are hidden in tool descriptors; (2) Shadowing, where trusted tools are indirectly compromised through contaminated shared context; and (3) Rug Pulls, where descriptors are altered after approval to subvert behavior. To counter these threats, we introduce a layered security framework with three components: RSA-based manifest signing to enforce descriptor integrity, LLM-on-LLM semantic vetting to detect suspicious tool definitions, and lightweight heuristic guardrails that block anomalous tool behavior at runtime. Through evaluation of GPT-4, DeepSeek, and Llama-3.5 across eight prompting strategies, we find that security performance varies widely by model architecture and reasoning method. GPT-4 blocks about 71 percent of unsafe tool calls, balancing latency and safety. DeepSeek shows the highest resilience to Shadowing attacks but with greater latency, while Llama-3.5 is fastest but least robust. Our results show that the proposed framework reduces unsafe tool invocation rates without model fine-tuning or internal modification.

🔍 Key Points

  • Introduction of three classes of semantic attacks on MCP-integrated systems: Tool Poisoning, Shadowing, and Rug Pulls.
  • Proposition of a layered security framework that includes RSA-based manifest signing, LLM-on-LLM semantic vetting, and lightweight heuristic guardrails.
  • Extensive evaluation of the proposed defenses across multiple LLMs (GPT-4, DeepSeek, Llama-3.5), showing security performance variations and trade-offs in latency.
  • Findings indicate that structured prompting techniques enhance safety but can also increase execution latency, emphasizing the safety-latency trade-off in LLM security.
  • Demonstration of the limitations of current defenses in addressing the new class of threats introduced by the Model Context Protocol.

💡 Why This Paper Matters

This paper provides critical insights into the security vulnerabilities of Large Language Models (LLMs) that utilize the Model Context Protocol (MCP) for tool integration, a largely overlooked area in AI security. By establishing specific attack vectors and proposing a novel layered defense approach, it addresses significant concerns about the autonomy and potential misuse of LLMs in real-world applications. The empirical findings highlight the urgent need for effective security measures in LLM deployments, making this research relevant and timely as LLMs are increasingly adopted across various domains.

🎯 Why It's Interesting for AI Security Researchers

The paper is of great interest to AI security researchers as it tackles emerging threats posed by adversarial manipulation of LLMs through the Model Context Protocol. It introduces new attack classes that exploit the interconnectedness and semantic interpretability of tool descriptors, a unique contribution that expands the understanding of potential vulnerabilities in agentic AI systems. The layered defense mechanisms proposed offer practical insights for improving model security against these advanced threats, paving the way for future research and development in LLM safety protocols.

📚 Read the Full Paper