← Back to Library

A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios

Authors: Kimberly T. Mai, Anna Gausen, Magda Dubois, Mona Murad, Bessie O'Dell, Nadine Staes-Polet, Christopher Summerfield, Andrew Strait

Published: 2026-02-25

arXiv ID: 2602.21831v2

Added to Library: 2026-03-04 04:00 UTC

Red Teaming

📄 Abstract

AI is increasingly being used to assist fraud and cybercrime. However, it is unclear the extent to which current large language models can provide useful information for complex criminal activity. Working with law enforcement and policy experts, we developed multi-turn evaluations for three fraud and cybercrime scenarios (romance scams, CEO impersonation, and identity theft). Our evaluations focus on text-to-text interactions. In each scenario, we evaluate whether models provide actionable assistance beyond information typically available on the web, as assessed by domain experts. We do so in ways designed to resemble real-world misuse, such as breaking down requests for fraud into a sequence of seemingly benign queries. We found that (1) current large language models provide minimal actionable information for fraud and cybercrime without the use of advanced jailbreaking techniques, (2) model safeguards have significant impact on the provision of information, with the two open-weight large language models fine-tuned to remove safety guardrails providing the most actionable and useful responses, and (3) decomposing requests into benign-seeming queries elicited more assistance than explicitly malicious framing or basic system-level jailbreaks. Overall, the results suggest that current text-generation models provide relatively minimal uplift for fraud and cybercrime through information provision, without extensive effort to circumvent safeguards. This work contributes a reproducible, expert-grounded framework for tracking how these risks may evolve with time as models grow more capable and adversaries adapt.

🔍 Key Points

  • Developed a reproducible, expert-grounded framework for assessing AI misuse in fraud and cybercrime scenarios using multi-turn evaluations.
  • Evaluated large language models (LLMs) across three specific fraud scenarios: romance scams, CEO impersonation, and identity theft, demonstrating their limitations in providing actionable assistance.
  • Findings indicate that while model safeguards significantly impact information provision, advanced jailbreaking techniques lead to higher rates of useful responses from uncensored models.
  • Observed that decomposing malicious requests into benign-seeming queries elicited more nuanced assistance, highlighting the subtleties in AI interaction dynamics.
  • Results emphasize the need for ongoing research and adaptive safeguards as AI capabilities evolve and adversaries might exploit these tools.

💡 Why This Paper Matters

This paper is pivotal as it systematically evaluates AI's role in facilitating fraud and cybercrime, revealing significant limitations in current models while also presenting a foundational methodology for ongoing assessments. These insights are crucial for informing both model development and protective strategies, ensuring that AI deployment does not inadvertently enable malicious activities.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper particularly relevant as it not only charts the landscape of AI misuse in real-world crime scenarios but also highlights crucial areas where AI safeguards are tested and can be improved. The rigorous methodology and findings on interaction dynamics provide a valuable framework for future studies aimed at enhancing model resilience against exploitation.

📚 Read the Full Paper