SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

📄 Abstract

Current LLM-based conversational recommender systems (CRS) primarily optimize recommendation accuracy and user satisfaction. We identify an underexplored vulnerability in which recommendation outputs may negatively impact users by violating personalized safety constraints, when individualized safety sensitivities -- such as trauma triggers, self-harm history, or phobias -- are implicitly inferred from the conversation but not respected during recommendation. We formalize this challenge as personalized CRS safety and introduce SafeRec, a new benchmark dataset designed to systematically evaluate safety risks in LLM-based CRS under user-specific constraints. To further address this problem, we propose SafeCRS, a safety-aware training framework that integrates Safe Supervised Fine-Tuning (Safe-SFT) with Safe Group reward-Decoupled Normalization Policy Optimization (Safe-GDPO) to jointly optimize recommendation quality and personalized safety alignment. Extensive experiments on SafeRec demonstrate that SafeCRS reduces safety violation rates by up to 96.5% relative to the strongest recommendation-quality baseline while maintaining competitive recommendation quality. Warning: This paper contains potentially harmful and offensive content.

🔍 Key Points

Identification of personalized safety alignment as a critical vulnerability in LLM-based conversational recommender systems (CRS).
Introduction of SafeRec, a benchmark dataset for evaluating safety risks in conversational recommendations based on user-specific constraints.
Development of SafeCRS, a novel safety-aware training framework that combines Safe Supervised Fine-Tuning (Safe-SFT) and Safe Group reward-Decoupled Normalization Policy Optimization (Safe-GDPO).
Demonstration of significant reductions in safety violation rates (up to 96.5%) while maintaining competitive recommendation quality across domains, such as movies and games.
Establishment of a two-stage training approach that simultaneously optimizes for safety and recommendation relevance, showcasing the efficacy of per-reward normalization.

💡 Why This Paper Matters

This paper presents a substantial advancement in understanding and mitigating safety issues in LLM-based conversational recommender systems. By introducing personalized safety constraints and the SafeCRS framework, the work addresses crucial aspects of user well-being that have been previously overlooked in recommender system design. These contributions set a foundation for the future development of safer and more tailored recommendation systems that respect individual user sensitivities, making significant strides toward ethical AI interaction.

🎯 Why It's Interesting for AI Security Researchers

The work is highly relevant to AI security researchers as it highlights the potential risks associated with LLM-based systems and emphasizes the importance of integrating user safety into AI models. The introduction of a personalized safety framework may inspire further studies into secure AI practices and inform best practices for developing systems that prioritize ethical considerations. Understanding these components is crucial for ensuring the responsible deployment of AI technologies.

SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper