← Back to Library

Can AI Models be Jailbroken to Phish Elderly Victims? An End-to-End Evaluation

Authors: Fred Heiding, Simon Lermen

Published: 2025-11-13

arXiv ID: 2511.11759v1

Added to Library: 2025-11-18 03:03 UTC

Red Teaming

📄 Abstract

We present an end-to-end demonstration of how attackers can exploit AI safety failures to harm vulnerable populations: from jailbreaking LLMs to generate phishing content, to deploying those messages against real targets, to successfully compromising elderly victims. We systematically evaluated safety guardrails across six frontier LLMs spanning four attack categories, revealing critical failures where several models exhibited near-complete susceptibility to certain attack vectors. In a human validation study with 108 senior volunteers, AI-generated phishing emails successfully compromised 11\% of participants. Our work uniquely demonstrates the complete attack pipeline targeting elderly populations, highlighting that current AI safety measures fail to protect those most vulnerable to fraud. Beyond generating phishing content, LLMs enable attackers to overcome language barriers and conduct multi-turn trust-building conversations at scale, fundamentally transforming fraud economics. While some providers report voluntary counter-abuse efforts, we argue these remain insufficient.

🔍 Key Points

  • Demonstrated vulnerabilities in AI safety guardrails across six frontier LLMs, revealing that many models are highly susceptible to phishing prompts, with critical failures identified particularly in generating malicious content targeting the elderly.
  • Conducted a human validation study showing that 11% of senior participants fell for LLM-generated phishing emails, highlighting the real-world threat posed by AI-generated content to vulnerable populations.
  • Outlined a comprehensive attack pipeline where attackers can jailbreak models and deploy AI-generated phishing messages, transforming the economics of online fraud by enabling highly personalized and convincing scams that can bypass traditional security measures.
  • Identified the limitations of current AI safety mechanisms and proposed the need for standardized safety measures across the AI industry to better protect vulnerable demographics, specifically emphasizing the protection of senior citizens against increasingly sophisticated threats.
  • Discussed potential countermeasures, including legal liability frameworks for AI providers and the need for enhanced digital identity verification systems to reduce the risks of AI-generated phishing.

💡 Why This Paper Matters

This paper underscores the critical vulnerability of elderly populations to AI-generated phishing attacks, revealing alarming gaps in AI safety guardrails that fail to prevent the generation of malicious content. By demonstrating a complete phishing attack pipeline and the susceptibility of various LLMs, this research highlights the pressing need for improved safety measures and regulatory frameworks to protect vulnerable groups. It serves as a wake-up call for AI developers, policymakers, and security professionals to prioritize the security of AI systems and their applications.

🎯 Why It's Interesting for AI Security Researchers

This paper is highly relevant to AI security researchers as it provides empirical evidence of the vulnerabilities in LLMs that can be exploited for phishing attacks. It highlights the effectiveness of advanced adversarial strategies, such as jailbreaking, and offers insights into the model's operational weaknesses. Furthermore, this research opens discussions on necessary safety improvements, countermeasures, and the legal implications of AI misuse, making it a critical contribution to the ongoing efforts to secure AI technologies against misapplication.

📚 Read the Full Paper