Defeating Prompt Injections by Design

Authors: Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr

Published: 2025-03-24

arXiv ID: 2503.18813v2

Added to Library: 2025-11-11 14:20 UTC

📄 Abstract

Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an untrusted environment. However, LLM agents are vulnerable to prompt injection attacks when handling untrusted data. In this paper we propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models are susceptible to attacks. To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query; therefore, the untrusted data retrieved by the LLM can never impact the program flow. To further improve security, CaMeL uses a notion of a capability to prevent the exfiltration of private data over unauthorized data flows by enforcing security policies when tools are called. We demonstrate effectiveness of CaMeL by solving $77\%$ of tasks with provable security (compared to $84\%$ with an undefended system) in AgentDojo. We release CaMeL at https://github.com/google-research/camel-prompt-injection.

🔍 Key Points

The paper proposes innovative defense techniques against prompt injection attacks by repurposing existing attack strategies, demonstrating that the same principles can be adapted for defense.
New defense methods were tested comprehensively against both direct and indirect prompt injection attacks, outperforming traditional training-free methods, and achieving comparable performance to fine-tuning techniques.
A significant reduction in the attack success rate (ASR) was reported, approaching zero in specific scenarios, showcasing the effectiveness of these defense mechanisms.
The authors performed an extensive evaluation of their methods across various open-source and closed-source LLMs, illustrating their generalizability and robustness against multiple types of attacks.
This study establishes a compelling connection between the effectiveness of attack methods and the defenses developed, providing a framework for future research in LLM security.

💡 Why This Paper Matters

This paper is relevant and important as it addresses a critical vulnerability in large language models (LLMs), specifically prompt injection attacks. By creatively using the mechanics of these attacks to formulate robust defenses, the authors provide a novel approach that significantly enhances the security of LLM applications. The findings not only contribute to the academic discourse on AI safety but also hold practical implications for developers and organizations utilizing LLMs.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of significant interest to AI security researchers as it tackles the pressing issue of security vulnerabilities in LLMs, which are increasingly deployed in real-world applications. The innovative defensive strategies proposed offer a fresh perspective on securing AI systems, emphasizing the importance of understanding and leveraging attack methodologies in developing robust defenses. Furthermore, the potential implications for safeguarding user data and maintaining trust in AI technologies are critical areas of concern for researchers focusing on secure AI deployment.

Defeating Prompt Injections by Design

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper