← Back to Library

Now You Hear Me: Audio Narrative Attacks Against Large Audio-Language Models

Authors: Ye Yu, Haibo Jin, Yaoning Yu, Jun Zhuang, Haohan Wang

Published: 2026-01-30

arXiv ID: 2601.23255v1

Added to Library: 2026-02-03 08:06 UTC

Red Teaming

📄 Abstract

Large audio-language models increasingly operate on raw speech inputs, enabling more seamless integration across domains such as voice assistants, education, and clinical triage. This transition, however, introduces a distinct class of vulnerabilities that remain largely uncharacterized. We examine the security implications of this modality shift by designing a text-to-audio jailbreak that embeds disallowed directives within a narrative-style audio stream. The attack leverages an advanced instruction-following text-to-speech (TTS) model to exploit structural and acoustic properties, thereby circumventing safety mechanisms primarily calibrated for text. When delivered through synthetic speech, the narrative format elicits restricted outputs from state-of-the-art models, including Gemini 2.0 Flash, achieving a 98.26% success rate that substantially exceeds text-only baselines. These results highlight the need for safety frameworks that jointly reason over linguistic and paralinguistic representations, particularly as speech-based interfaces become more prevalent.

🔍 Key Points

  • Introduces a novel attack vector exploiting paralinguistic features of speech to induce compliance in audio-language models.
  • Demonstrates that stylized vocal delivery significantly enhances the success rate of jailbreak attacks compared to traditional text-based methods, achieving up to 98.26% success rates.
  • Proposes a heuristic design of delivery styles based on psychological principles, creating a framework for adversarial speech generation that manipulates the model's internal decision-making.
  • Conducts rigorous empirical evaluation on multiple state-of-the-art audio-language models, revealing vulnerabilities that traditional alignment safeguards fail to cover.
  • Highlights the importance of integrating multimodal safety defenses that consider both linguistic content and delivery style for robust model alignment.

💡 Why This Paper Matters

This paper is crucial as it uncovers significant vulnerabilities in large audio-language models by leveraging the nuances of human speech. By showing how delivery style can bypass safety measures, it calls for a reevaluation of security protocols in audio processing systems, suggesting that traditional text-based defenses are insufficient. The findings emphasize the need for integrated approaches to AI safety that account for both the content and its delivery mode, particularly as voice interfaces become more prevalent in applications.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper particularly relevant as it challenges existing paradigms of model alignment by showcasing how psychological principles can be weaponized to exploit AI systems. The insights into audio-based attacks against language models provide a new frontier in adversarial machine learning and prompt the community to rethink the safeguards and defenses currently in use. Understanding these vulnerabilities is essential for developing more robust models and ensuring safe interactions with AI technologies.

📚 Read the Full Paper