← Back to Library

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

Authors: Donghyun Lee, Mo Tiwari

Published: 2024-10-09

arXiv ID: 2410.07283v1

Added to Library: 2025-11-11 14:09 UTC

Red Teaming

📄 Abstract

As Large Language Models (LLMs) grow increasingly powerful, multi-agent systems are becoming more prevalent in modern AI applications. Most safety research, however, has focused on vulnerabilities in single-agent LLMs. These include prompt injection attacks, where malicious prompts embedded in external content trick the LLM into executing unintended or harmful actions, compromising the victim's application. In this paper, we reveal a more dangerous vector: LLM-to-LLM prompt injection within multi-agent systems. We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents, behaving much like a computer virus. This attack poses severe threats, including data theft, scams, misinformation, and system-wide disruption, all while propagating silently through the system. Our extensive experiments demonstrate that multi-agent systems are highly susceptible, even when agents do not publicly share all communications. To address this, we propose LLM Tagging, a defense mechanism that, when combined with existing safeguards, significantly mitigates infection spread. This work underscores the urgent need for advanced security measures as multi-agent LLM systems become more widely adopted.

🔍 Key Points

  • Introduction of 'Prompt Infection': A novel LLM-to-LLM prompt injection attack that self-replicates across interconnected agents, leading to severe system-wide security threats.
  • Demonstrated strong susceptibility of multi-agent systems (MAS) to prompt infections, revealing the dangers of internal agent interactions compared to traditional single-agent scenarios.
  • Proposed 'LLM Tagging' as a defense mechanism that, when combined with existing safeguards, significantly reduces the spread of infections in MAS.
  • Empirical evidence showing that the power of LLMs like GPT-4o does not guarantee greater safety, as compromised powerful models execute malicious actions more effectively than weaker models.
  • Extensive experiments conducted on various multi-agent applications highlight the efficiency and risk of prompt infection within social simulations.

💡 Why This Paper Matters

This paper is significant as it uncovers critical vulnerabilities in multi-agent systems that have been largely overlooked in LLM safety research. By revealing how prompt injection can evolve into more sophisticated and dire attacks, the authors stress the urgent need for robust security measures in LLM-integrated applications, particularly as these systems become increasingly ubiquitous in various fields.

🎯 Why It's Interesting for AI Security Researchers

This paper would be particularly interesting to AI security researchers because it expands the understanding of prompt injection attacks from single-agent systems to complex multi-agent architectures. It introduces new attack vectors, empirical insights, and innovative defense strategies, prompting further exploration of security protocols necessary to safeguard LLMs. The findings urge researchers to rethink current assumptions about the safety of multi-agent systems and encourage the development of more resilient AI frameworks.

📚 Read the Full Paper