← Back to Library

Design Patterns for Securing LLM Agents against Prompt Injections

Authors: Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn

Published: 2025-06-10

arXiv ID: 2506.08837v3

Added to Library: 2025-11-11 14:21 UTC

📄 Abstract

As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent's resilience on natural language inputs -- an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.

🔍 Key Points

  • Introduction of an in-browser fuzzing framework designed to uncover prompt injection vulnerabilities in AI browsers, allowing real-time security testing directly within the browser environment.
  • Utilization of large language models (LLMs) to generate diverse and evolving attack vectors by mutating existing templates, resulting in a more sophisticated and effective fuzzing process.
  • Zero false positive guarantee achieved through a robust action-based detection mechanism that records actual unwanted actions taken by AI agents when faced with malicious content.
  • Identification of specific high-risk features in AI-based browsing assistants, such as summarization and question answering, that are vulnerable to advanced prompt injection attacks.
  • Demonstration of the progressive evasion phenomenon, where initial defenses against prompt injection attacks degrade significantly over iterative testing against adaptive approaches.

💡 Why This Paper Matters

The paper presents a substantial advancement in the security testing of AI-assisted browsers by introducing a novel in-browser, LLM-guided fuzzing framework. This approach allows for real-time, high-fidelity testing of AI agents' vulnerabilities to indirect prompt injections, uncovering weaknesses that traditional defenses might miss. With the rapid integration of AI into everyday tools, understanding and mitigating these vulnerabilities is crucial to enhancing user safety and maintaining trust in automated systems.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant to AI security researchers due to its novel approach to fuzz testing AI-powered systems. The detailed exploration of prompt injection vulnerabilities in AI agents highlights the pressing need for adaptive security measures. This research not only offers practical implications for improving security frameworks in AI applications but also provides insights into the evolving tactics used by malicious actors. Additionally, the open-source nature of the framework enables collaboration and further research in this crucial area of AI safety.

📚 Read the Full Paper