← Back to Library

VoiceSHIELD-Small: Real-Time Malicious Speech Detection and Transcription

Authors: Sumit Ranjan, Sugandha Sharma, Ubaid Abbas, Puneeth N Ail

Published: 2026-03-08

arXiv ID: 2603.07708v1

Added to Library: 2026-03-10 03:01 UTC

📄 Abstract

Voice interfaces are quickly becoming a common way for people to interact with AI systems. This also brings new security risks, such as prompt injection, social engineering, and harmful voice commands. Traditional security methods rely on converting speech to text and then filtering that text, which introduces delays and can ignore important audio cues. This paper introduces VoiceSHIELD-Small, a lightweight model that works in real time. It can transcribe speech and detect whether it is safe or harmful, all in one step. Built on OpenAI's Whisper-small encoder, VoiceSHIELD adds a mean-pooling layer and a simple classification head. It takes just 90-120 milliseconds to classify audio on mid-tier GPUs, while transcription happens at the same time. Tested on a balanced set of 947 audio clips, the model achieved 99.16 percent accuracy and an F1 score of 0.9865. At the default setting, it missed 2.33 percent of harmful inputs. Cross-validation showed consistent performance (F1 standard deviation = 0.0026). The paper also covers the model's design, training data, performance trade-offs, and responsible use guidelines. VoiceSHIELD is released under the MIT license to encourage further research and adoption in voice AI security.

🔍 Key Points

  • Identification of a video-specific vulnerability in text-to-video (T2V) models, which can infill harmful intermediate frames from sparse boundary conditions in fragmented prompts.
  • Development of a two-step framework, titled TFM (Two Frame Matter), which uses Temporal Boundary Prompting (TBP) and Covert Substitution Mechanism (CSM) to enhance the effectiveness of jailbreak attacks on T2V models.
  • Empirical validation across multiple open-source and commercial T2V models demonstrating that TFM achieves up to a 12% increase in the attack success rate compared to existing methods.
  • The introduction of a novel threat model for T2V systems, enabling a stringent black-box context for evaluating prompt vulnerabilities against safety filters.
  • Necessary implications for the design of safer and more robust T2V models, highlighting the importance of temporally aware safety mechanisms.

💡 Why This Paper Matters

This paper presents significant advancements in understanding the vulnerabilities of text-to-video models, specifically how they can be manipulated through temporal prompt engineering. By demonstrating the effectiveness of the TFM framework in executing successful jailbreak attacks, the authors emphasize the urgent need for improved safety mechanisms, particularly as T2V technology becomes more prevalent and potent in real-world applications. The findings underscore the imperative for ongoing research into AI safety, ensuring responsible deployment of advanced generative systems.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of interest to AI security researchers as it unveils sophisticated attack vectors specific to the rapidly evolving text-to-video generative systems. It emphasizes the interplay between model structure and prompt engineering, serving as a reference for developing countermeasures against existing vulnerabilities. Furthermore, it provides empirical evidence demonstrating the effectiveness of novel attack methodologies, critical for anticipating and mitigating potential risks associated with AI-generated content.

📚 Read the Full Paper