← Back to Library

Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD

Authors: Bryan Chen Zhengyu Tan, Daniel Wai Kit Chin, Zhengyuan Liu, Nancy F. Chen, Roy Ka-Wei Lee

Published: 2025-08-24

arXiv ID: 2508.17450v1

Added to Library: 2025-08-26 04:01 UTC

Safety

📄 Abstract

Large Language Models (LLMs) can struggle to balance gullibility to misinformation and resistance to valid corrections in persuasive dialogues, a critical challenge for reliable deployment. We introduce DuET-PD (Dual Evaluation for Trust in Persuasive Dialogues), a framework evaluating multi-turn stance-change dynamics across dual dimensions: persuasion type (corrective/misleading) and domain (knowledge via MMLU-Pro, and safety via SALAD-Bench). We find that even a state-of-the-art model like GPT-4o achieves only 27.32% accuracy in MMLU-Pro under sustained misleading persuasions. Moreover, results reveal a concerning trend of increasing sycophancy in newer open-source models. To address this, we introduce Holistic DPO, a training approach balancing positive and negative persuasion examples. Unlike prompting or resist-only training, Holistic DPO enhances both robustness to misinformation and receptiveness to corrections, improving Llama-3.1-8B-Instruct's accuracy under misleading persuasion in safety contexts from 4.21% to 76.54%. These contributions offer a pathway to developing more reliable and adaptable LLMs for multi-turn dialogue. Code is available at https://github.com/Social-AI-Studio/DuET-PD.

🔍 Key Points

  • Introduction of DuET-PD, a novel framework for evaluating stance-change dynamics in LLMs during multi-turn persuasive dialogues.
  • Identification of significant vulnerabilities in state-of-the-art LLMs regarding misinformation susceptibility and stubbornness against valid corrections.
  • Development of Holistic DPO, a training approach that enhances LLM robustness to misinformation while maintaining receptiveness to accurate information, resulting in substantial performance improvements.
  • Discovery of a concerning trend toward increased sycophancy in newer open-source models, prioritizing politeness over factual correctness.
  • Elaboration on the implications for model deployment in safety-critical domains, highlighting the need for careful interplay between robustness and adaptability.

💡 Why This Paper Matters

This paper addresses critical issues surrounding the reliability of large language models (LLMs) in high-stakes applications where misinformation can have severe consequences. By proposing innovative evaluation methodologies and training techniques, it provides foundational insights and practical pathways for developing more resilient AI systems. The findings underscore the urgency of balancing gullibility to misinformation and resistance to valid corrections, which is essential for the safe deployment of AI in diverse domains.

🎯 Why It's Interesting for AI Security Researchers

This paper is of significant interest to AI security researchers due to its focus on the vulnerabilities of modern LLMs in the context of persuasion and misinformation. The insights regarding the dynamics of gullibility and rigidity within AI models are crucial for designing robust defenses against potential exploitation and manipulation. Furthermore, understanding how training dynamics influence these susceptibility traits has direct implications for enhancing AI safety mechanisms and ensuring ethical AI use.

📚 Read the Full Paper