← Back to Library

Promoting Online Safety by Simulating Unsafe Conversations with LLMs

Authors: Owen Hoffman, Kangze Peng, Zehua You, Sajid Kamal, Sukrit Venkatagiri

Published: 2025-07-29

arXiv ID: 2507.22267v1

Added to Library: 2025-07-31 04:01 UTC

Safety

📄 Abstract

Generative AI, including large language models (LLMs) have the potential -- and already are being used -- to increase the speed, scale, and types of unsafe conversations online. LLMs lower the barrier for entry for bad actors to create unsafe conversations in particular because of their ability to generate persuasive and human-like text. In our current work, we explore ways to promote online safety by teaching people about unsafe conversations that can occur online with and without LLMs. We build on prior work that shows that LLMs can successfully simulate scam conversations. We also leverage research in the learning sciences that shows that providing feedback on one's hypothetical actions can promote learning. In particular, we focus on simulating scam conversations using LLMs. Our work incorporates two LLMs that converse with each other to simulate realistic, unsafe conversations that people may encounter online between a scammer LLM and a target LLM but users of our system are asked provide feedback to the target LLM.

🔍 Key Points

  • The paper presents a novel approach to online safety by using LLMs to simulate scam conversations, allowing users to practice identifying and resisting such interactions.
  • It emphasizes the educational aspects of the simulation, leveraging feedback mechanisms to enhance learning about unsafe online conversations.
  • Two different LLMs were used, one simulating the scammer and the other acting as the target, with distinct personalities that affect user interaction and learning outcomes.
  • The authors recognized and addressed challenges related to model safety constraints and developed strategies to bypass some limitations imposed by LLM providers for realistic interactions.
  • The paper discusses both the opportunities and potential emotional risks of simulating unsafe conversations, highlighting the need for careful design and evaluation.

💡 Why This Paper Matters

This research advances the conversation around AI safety by focusing on the role of generative models in both the creation of unsafe online interactions and in user education regarding those interactions. The unique combination of simulation and feedback promotes user resilience against scams, marking a critical step in leveraging AI for positive societal impact.

🎯 Why It's Interesting for AI Security Researchers

This paper is highly relevant to AI security researchers as it addresses a significant duality in the use of LLMs: their potential to facilitate unsafe behaviors and their capability to educate users against such behaviors. The insights into how generative models can be manipulated to create realistic scams will inform defenses against AI-facilitated scams and guide the development of more robust safety mechanisms for LLMs.

📚 Read the Full Paper