VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

Authors: Lesly Miculicich, Mihir Parmar, Hamid Palangi, Krishnamurthy Dj Dvijotham, Mirko Montanari, Tomas Pfister, Long T. Le

Published: 2025-10-03

arXiv ID: 2510.05156v1

Added to Library: 2025-10-08 04:01 UTC

Safety

📄 Abstract

The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These agents may deviate from user objectives, violate data handling policies, or be compromised by adversarial attacks. Mitigating these dangers necessitates a mechanism to formally guarantee that an agent's actions adhere to predefined safety constraints, a challenge that existing systems do not fully address. We introduce VeriGuard, a novel framework that provides formal safety guarantees for LLM-based agents through a dual-stage architecture designed for robust and verifiable correctness. The initial offline stage involves a comprehensive validation process. It begins by clarifying user intent to establish precise safety specifications. VeriGuard then synthesizes a behavioral policy and subjects it to both testing and formal verification to prove its compliance with these specifications. This iterative process refines the policy until it is deemed correct. Subsequently, the second stage provides online action monitoring, where VeriGuard operates as a runtime monitor to validate each proposed agent action against the pre-verified policy before execution. This separation of the exhaustive offline validation from the lightweight online monitoring allows formal guarantees to be practically applied, providing a robust safeguard that substantially improves the trustworthiness of LLM agents.

🔍 Key Points

Introduction of the VeriGuard framework, designed specifically for enhancing safety in autonomous AI agents by providing formal guarantees of compliance with safety constraints.
Dual-stage architecture comprising an offline policy generation and verification stage followed by a lightweight online action monitoring stage.
Iterative refinement loop that involves clear user intent specification, policy synthesis, code testing, and formal verification ensures provable correctness of agent actions.
Empirical validation demonstrating effective reduction in attack success rates (ASR) to near-zero while maintaining high task success rates (TSR) across multiple datasets.
Evaluation against state-of-the-art approaches, showcasing superior performance in both security and operational utility, highlighting a successful balance of safety and functionality.

💡 Why This Paper Matters

The VeriGuard framework represents a significant advancement in the field of AI safety, offering a structured approach to ensuring that autonomously operating LLM agents act safely and in accordance with predefined constraints. By leveraging formal verification techniques, the paper not only addresses critical current vulnerabilities in AI systems but also provides a pathway towards more secure and reliable deployment of these technologies in sensitive domains.

🎯 Why It's Interesting for AI Security Researchers

This paper is of particular interest to AI security researchers because it tackles fundamental safety issues associated with increasingly autonomous AI systems. With the rise of LLMs and their applications in sensitive areas like healthcare and finance, understanding how to enforce safety constraints through formal methods is critical. The implications of VeriGuard's findings are profound, as they pave the way for the development of AI agents that can be trusted to operate securely in real-world applications, making it a pivotal contribution to the field.

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper