← Back to Library

PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents

Authors: Yaozu Wu, Jizhou Guo, Dongyuan Li, Henry Peng Zou, Wei-Chieh Huang, Yankai Chen, Zhen Wang, Weizhi Zhang, Yangning Li, Meng Zhang, Renhe Jiang, Philip S. Yu

Published: 2025-09-28

arXiv ID: 2509.23614v1

Added to Library: 2025-09-30 04:06 UTC

Safety

📄 Abstract

Effective guardrails are essential for safely deploying LLM-based agents in critical applications. Despite recent advances, existing guardrails suffer from two fundamental limitations: (i) they apply uniform guardrail policies to all users, ignoring that the same agent behavior can harm some users while being safe for others; (ii) they check each response in isolation, missing how risks evolve and accumulate across multiple interactions. To solve these issues, we propose PSG-Agent, a personalized and dynamic system for LLM-based agents. First, PSG-Agent creates personalized guardrails by mining the interaction history for stable traits and capturing real-time states from current queries, generating user-specific risk thresholds and protection strategies. Second, PSG-Agent implements continuous monitoring across the agent pipeline with specialized guards, including Plan Monitor, Tool Firewall, Response Guard, Memory Guardian, that track cross-turn risk accumulation and issue verifiable verdicts. Finally, we validate PSG-Agent in multiple scenarios including healthcare, finance, and daily life automation scenarios with diverse user profiles. It significantly outperform existing agent guardrails including LlamaGuard3 and AGrail, providing an executable and auditable path toward personalized safety for LLM-based agents.

🔍 Key Points

  • PSG-Agent introduces a personalized guardrail system that tailors safety measures specific to individual user profiles and their unique contexts.
  • The approach addresses two major shortcomings of existing systems: the lack of adaptability to diverse user needs and the inability to monitor and manage cumulative risks during multi-turn interactions.
  • PSG-Agent features a modular architecture with specialized components such as Profile Miner, Input Guard, Plan Monitor, Response Guard, and Tool Firewall, enabling dynamic risk assessment and intervention throughout the agent's decision-making processes.
  • Extensive validation of PSG-Agent shows it significantly outperforms existing guardrail systems in accuracy, recall, and overall effectiveness in risk assessment, making it suitable for critical applications in various scenarios like healthcare and finance.
  • The work establishes a comprehensive benchmark for evaluating personalized safety in LLM-based agents, further contributing to the field by setting standards for future research.

💡 Why This Paper Matters

The paper presents an innovative framework, PSG-Agent, that significantly advances the state of safety mechanisms for LLM-based agents. By emphasizing personalization and dynamic monitoring, PSG-Agent addresses critical limitations in existing guardrail approaches, making it vital for safe AI deployment in high-stakes environments. Its contributions not only enhance the operational safety of LLMs but also pave the way for more adaptive and user-sensitive applications in diverse fields such as healthcare and finance.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper relevant because it tackles fundamental issues in safeguarding LLM-based agents across various applications. The introduction of a system that customizes security measures based on individual user profiles introduces a new paradigm in AI safety, moving away from one-size-fits-all approaches. Furthermore, the comprehensive evaluation and benchmarking methods discussed provide valuable insights into effective safety and risk management strategies, which are crucial for mitigating potential threats posed by AI systems in real-world scenarios.

📚 Read the Full Paper