← Back to Library

A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks

Authors: S M Asif Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, M. F. Mridha, Jungpil Shin

Published: 2025-09-16

arXiv ID: 2509.14285v1

Added to Library: 2025-09-19 04:03 UTC

Safety

📄 Abstract

Prompt injection attacks represent a major vulnerability in Large Language Model (LLM) deployments, where malicious instructions embedded in user inputs can override system prompts and induce unintended behaviors. This paper presents a novel multi-agent defense framework that employs specialized LLM agents in coordinated pipelines to detect and neutralize prompt injection attacks in real-time. We evaluate our approach using two distinct architectures: a sequential chain-of-agents pipeline and a hierarchical coordinator-based system. Our comprehensive evaluation on 55 unique prompt injection attacks, grouped into 8 categories and totaling 400 attack instances across two LLM platforms (ChatGLM and Llama2), demonstrates significant security improvements. Without defense mechanisms, baseline Attack Success Rates (ASR) reached 30% for ChatGLM and 20% for Llama2. Our multi-agent pipeline achieved 100% mitigation, reducing ASR to 0% across all tested scenarios. The framework demonstrates robustness across multiple attack categories including direct overrides, code execution attempts, data exfiltration, and obfuscation techniques, while maintaining system functionality for legitimate queries.

🔍 Key Points

  • Introduction of a multi-agent defense framework that effectively mitigates prompt injection attacks in real-time using specialized LLM agents.
  • Evaluation of two distinct pipeline architectures: a sequential chain-of-agents and a hierarchical coordinator-based system, demonstrating their effectiveness against 55 unique attacks across 400 instances.
  • Achieved 100% mitigation of Attack Success Rates (ASR), reducing ASR to 0% across all tested scenarios on ChatGLM and Llama2, showcasing robustness across various attack types.
  • Comprehensive dataset construction with a focus on common and sophisticated prompt injection strategies, providing a rich basis for evaluating defense mechanisms.
  • Practical deployment guidelines and a multi-dimensional assessment of different architectures, aiding practitioners in understanding the trade-offs between complexity, performance, and security.

💡 Why This Paper Matters

This paper presents a significant advancement in securing Large Language Models (LLMs) against prompt injection attacks, a critical vulnerability in AI applications. By introducing a novel multi-agent defense architecture that ensures comprehensive protection without sacrificing functionality, the authors contribute valuable insights and methodologies that strengthen the integrity and reliability of LLM deployments. Such protective measures are essential as LLMs become increasingly pervasive in sensitive domains.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper is crucial as it addresses a prominent security vulnerability inherent in LLMs, proposing a structured and empirical approach to mitigation. The dual-pipeline architecture and thorough evaluation against varied attack vectors enrich the security discourse, providing both theoretical and practical frameworks that can be referenced in future research on AI safety. The results underscore the importance of proactive defense mechanisms, making this work a relevant resource in the ongoing challenge of protecting AI systems from adversarial threats.

📚 Read the Full Paper