← Back to Library

A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks

Authors: S M Asif Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, M. F. Mridha, Jungpil Shin

Published: 2025-09-16

arXiv ID: 2509.14285v2

Added to Library: 2025-11-11 14:19 UTC

📄 Abstract

Prompt injection attacks represent a major vulnerability in Large Language Model (LLM) deployments, where malicious instructions embedded in user inputs can override system prompts and induce unintended behaviors. This paper presents a novel multi-agent defense framework that employs specialized LLM agents in coordinated pipelines to detect and neutralize prompt injection attacks in real-time. We evaluate our approach using two distinct architectures: a sequential chain-of-agents pipeline and a hierarchical coordinator-based system. Our comprehensive evaluation on 55 unique prompt injection attacks, grouped into 8 categories and totaling 400 attack instances across two LLM platforms (ChatGLM and Llama2), demonstrates significant security improvements. Without defense mechanisms, baseline Attack Success Rates (ASR) reached 30% for ChatGLM and 20% for Llama2. Our multi-agent pipeline achieved 100% mitigation, reducing ASR to 0% across all tested scenarios. The framework demonstrates robustness across multiple attack categories including direct overrides, code execution attempts, data exfiltration, and obfuscation techniques, while maintaining system functionality for legitimate queries.

🔍 Key Points

  • Introduction of text prompt injection as a novel attack method against vision language models (VLMs) that is both efficient and effective.
  • Development of a systematic algorithm for executing text prompt injection attacks, focusing on optimizing placement and embedding techniques for injected prompts.
  • Demonstration through experiments that text prompt injection significantly outperforms traditional gradient-based attacks (e.g., PGD) across various models and parameters.
  • Identification of a correlation between the effectiveness of the attack and the number of parameters in the VLMs, highlighting the requirement for large models to successfully execute the attack.
  • Insightful analysis of background color consistency in images and its impact on the readability and effectiveness of injected prompts.

💡 Why This Paper Matters

This paper is crucial in understanding the vulnerabilities of large vision language models to text prompt injection attacks, a methodology that poses significant implications in the field of AI security. By providing a robust algorithm and experimental validation, the research contributes to the growing discussion on the safety and reliability of AI systems, especially in applications involving multimodal inputs. The insights presented pave the way for future investigations into both offensive and defensive strategies against such vulnerabilities.

🎯 Why It's Interesting for AI Security Researchers

This paper will be of great interest to AI security researchers due to its exploration of a relatively under-examined attack vector in the rapidly evolving field of multimodal AI. The findings highlight significant security risks associated with text prompt injection in VLMs, which could have real-world ramifications in numerous applications. Researchers focused on adversarial attacks, model robustness, and safety mechanisms can leverage the methods and insights provided to develop new defenses, improve model resilience, and build safer AI systems.

📚 Read the Full Paper