Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking

📄 Abstract

Large Language Model (LLM) agents exhibit powerful autonomous capabilities across domains such as robotics, virtual assistants, and web automation. However, their stochastic behavior introduces significant safety risks that are difficult to anticipate. Existing rule-based enforcement systems, such as AgentSpec, focus on developing reactive safety rules, which typically respond only when unsafe behavior is imminent or has already occurred. These systems lack foresight and struggle with long-horizon dependencies and distribution shifts. To address these limitations, we propose Pro2Guard, a proactive runtime enforcement framework grounded in probabilistic reachability analysis. Pro2Guard abstracts agent behaviors into symbolic states and learns a Discrete-Time Markov Chain (DTMC) from execution traces. At runtime, it anticipates future risks by estimating the probability of reaching unsafe states, triggering interventions before violations occur when the predicted risk exceeds a user-defined threshold. By incorporating semantic validity checks and leveraging PAC bounds, Pro2Guard ensures statistical reliability while approximating the underlying ground-truth model. We evaluate Pro2Guard extensively across two safety-critical domains: embodied household agents and autonomous vehicles. In embodied agent tasks, Pro2Guard enforces safety early on up to 93.6% of unsafe tasks using low thresholds, while configurable modes (e.g., reflect) allow balancing safety with task success, maintaining up to 80.4% task completion. In autonomous driving scenarios, Pro2Guard achieves 100% prediction of traffic law violations and collisions, anticipating risks up to 38.66 seconds ahead.

🔍 Key Points

Introduction of Pro2Guard, a proactive runtime enforcement framework using probabilistic model checking to ensure safety in LLM agents.
Development of a four-stage method using Discrete-Time Markov Chains (DTMCs) for abstracting agent behavior and performing risk assessments at runtime.
Pro2Guard's testing demonstrated a significant reduction in unsafe outcomes (down to 2.60%) and high prediction accuracy (100%) for traffic law violations in autonomous vehicles.
The framework provides a balance between safety enforcement and task completion, making it adaptable to various operational contexts without sacrificing functionality.

💡 Why This Paper Matters

This paper presents a significant advance in the safety of LLM-powered agents, which is crucial as these technologies become more pervasive in safety-critical applications. By enabling proactive risk management and enforcement strategies, Pro2Guard addresses inefficiencies in existing reactive systems, promising enhanced reliability and trustworthiness in autonomous agents.

🎯 Why It's Interesting for AI Security Researchers

The results and methodologies proposed in this paper are particularly relevant to AI security researchers focused on ensuring the safety and reliability of AI systems. With LLM agents increasingly integrated into many applications, understanding and mitigating risks associated with their use is paramount. The developments in Pro2Guard provide a new framework for evaluating and enhancing the safety measures necessary for deployment in real-world scenarios.

Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper