A Framework for Formalizing LLM Agent Security

Authors: Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Neil Gong, Chenguang Wang, Dawn Song

Published: 2026-03-19

arXiv ID: 2603.19469v1

Added to Library: 2026-03-23 02:02 UTC

Red Teaming

📄 Abstract

Security in LLM agents is inherently contextual. For example, the same action taken by an agent may represent legitimate behavior or a security violation depending on whose instruction led to the action, what objective is being pursued, and whether the action serves that objective. However, existing definitions of security attacks against LLM agents often fail to capture this contextual nature. As a result, defenses face a fundamental utility-security tradeoff: applying defenses uniformly across all contexts can lead to significant utility loss, while applying defenses in insufficient or inappropriate contexts can result in security vulnerabilities. In this work, we present a framework that systematizes existing attacks and defenses from the perspective of contextual security. To this end, we propose four security properties that capture contextual security for LLM agents: task alignment (pursuing authorized objectives), action alignment (individual actions serving those objectives), source authorization (executing commands from authenticated sources), and data isolation (ensuring information flows respect privilege boundaries). We further introduce a set of oracle functions that enable verification of whether these security properties are violated as an agent executes a user task. Using this framework, we reformalize existing attacks, such as indirect prompt injection, direct prompt injection, jailbreak, task drift, and memory poisoning, as violations of one or more security properties, thereby providing precise and contextual definitions of these attacks. Similarly, we reformalize defenses as mechanisms that strengthen oracle functions or perform security property checks. Finally, we discuss several important future research directions enabled by our framework.

🔍 Key Points

The paper introduces a framework that formalizes the contextual nature of security in LLM agents, highlighting the importance of authorization context in distinguishing legitimate actions from security violations.
Four security properties are defined: task alignment, action alignment, source authorization, and data isolation, which capture the essential aspects of contextual security for LLM agents.
This framework incorporates oracle functions that provide a theoretical basis for verifying security properties, thereby offering a systematic approach to classifying existing attacks and defenses against LLM agents.
The authors reformulate various known attacks (like prompt injections, jailbreaking, and memory poisoning) in the context of these properties, leading to more precise definitions and a better understanding of security vulnerabilities in LLM systems.
Important future research directions are identified, focusing on improving the practical implementation of oracle functions and addressing the complexities of dynamic environments and compositional security.

💡 Why This Paper Matters

This paper is significant as it offers a structured methodology for evaluating the security of LLM agents, which are increasingly deployed in sensitive contexts. By emphasizing contextual security properties, the framework aids in understanding the nuances of agent behavior, thereby facilitating the development of more robust defenses against potential vulnerabilities.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of great interest to AI security researchers as it systematically addresses the critical challenge of contextual security in LLM agents. The introduction of new security properties and oracle functions enables researchers to better characterize attacks and design effective defenses, bridging significant gaps in current methodologies. Furthermore, as LLMs become more integrated into real-world applications, ensuring their security through robust frameworks is paramount for safe deployment.

A Framework for Formalizing LLM Agent Security

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper