DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

📄 Abstract

Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities. By interacting with external environments through predefined tools, these agents can carry out complex user tasks. Nonetheless, this interaction also introduces the risk of prompt injection attacks, where malicious inputs from external sources can mislead the agent's behavior, potentially resulting in economic loss, privacy leakage, or system compromise. System-level defenses have recently shown promise by enforcing static or predefined policies, but they still face two key challenges: the ability to dynamically update security rules and the need for memory stream isolation. To address these challenges, we propose DRIFT, a Dynamic Rule-based Isolation Framework for Trustworthy agentic systems, which enforces both control- and data-level constraints. A Secure Planner first constructs a minimal function trajectory and a JSON-schema-style parameter checklist for each function node based on the user query. A Dynamic Validator then monitors deviations from the original plan, assessing whether changes comply with privilege limitations and the user's intent. Finally, an Injection Isolator detects and masks any instructions that may conflict with the user query from the memory stream to mitigate long-term risks. We empirically validate the effectiveness of DRIFT on the AgentDojo benchmark, demonstrating its strong security performance while maintaining high utility across diverse models -- showcasing both its robustness and adaptability.

🔍 Key Points

Introduction of DRIFT, a Dynamic Rule-based Isolation Framework designed to enhance the security of LLM agents against prompt injection attacks.
DRIFT incorporates three main components: Secure Planner, Dynamic Validator, and Injection Isolator, which work together to enforce control- and data-level constraints while allowing dynamic updates of security policies.
Empirical validation on the AgentDojo benchmark demonstrates DRIFT's efficacy, significantly reducing the Attack Success Rate (ASR) from 30.7% to 1.3% while maintaining high utility across various models.
The framework not only addresses the immediate threats posed by prompt injection through isolation mechanisms but also allows for enhanced task performance by dynamically modifying security constraints during execution.
Ablation studies confirm the contributions of each component in improving the overall security and functionality balance of LLM agents.

💡 Why This Paper Matters

The paper presents a comprehensive and innovative solution to a pressing issue in AI security — protecting LLM agents from prompt injection attacks. DRIFT's layered defense system not only enhances security but also maintains the operational utility of agents, making it a significant advancement in the field. Given the increasing reliance on LLMs in various applications, ensuring their security is paramount for trust and reliability in AI systems.

🎯 Why It's Interesting for AI Security Researchers

This paper is crucial for AI security researchers because it highlights an advanced framework that effectively addresses a significant vulnerability in LLM agents via dynamic security mechanisms. The integration of memory isolation along with dynamic policy updates represents a novel approach to enhancing security in real-world applications of AI. Additionally, the findings from empirical validations provide valuable insights into the practical implications of such frameworks, informing future research and development in AI safety and security.

DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper