← Back to Library

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Authors: Fan Yang

Published: 2026-03-10

arXiv ID: 2603.10091v1

Added to Library: 2026-03-12 02:03 UTC

Red Teaming Safety

📄 Abstract

The widespread adoption of thinking mode in large language models (LLMs) has significantly enhanced complex task processing capabilities while introducing new security risks. When subjected to jailbreak attacks, the step-by-step reasoning process may cause models to generate more detailed harmful content. We observe that thinking mode exhibits unique vulnerabilities when processing interleaved multiple tasks. Based on this observation, we propose multi-stream perturbation attack, which generates superimposed interference by interweaving multiple task streams within a single prompt. We design three perturbation strategies: multi-stream interleaving, inversion perturbation, and shape transformation, which disrupt the thinking process through concurrent task interleaving, character reversal, and format constraints respectively. On JailbreakBench, AdvBench, and HarmBench datasets, our method achieves attack success rates exceeding most methods across mainstream models including Qwen3 series, DeepSeek, Qwen3-Max, and Gemini 2.5 Flash. Experiments show thinking collapse rates and response repetition rates reach up to 17% and 60% respectively, indicating multi-stream perturbation not only bypasses safety mechanisms but also causes thinking process collapse or repetitive outputs.

🔍 Key Points

  • Introduction of the multi-stream perturbation attack which interleaves benign and harmful tasks to compromise LLM safety mechanisms.
  • Three perturbation strategies are developed: multi-stream interleaving, inversion perturbation, and shape transformation, targeting specific vulnerabilities of thinking mode in LLMs.
  • Significant attack success rates (>90% in some cases), thinking collapse rates (up to 17%), and response repetition rates (60%) achieved across various datasets and mainstream models like Qwen3 and DeepSeek.
  • Empirical validation showing that traditional safety mechanisms fail under the pressure of these coordinated attacks, demonstrating the instability of LLM reasoning under concurrent task processing.
  • Results highlight the urgent need for improved safety alignment techniques in LLMs, especially those employing advanced reasoning capabilities.

💡 Why This Paper Matters

This paper is highly relevant to the field of AI security as it exposes critical vulnerabilities in LLMs, specifically through their newly adopted thinking mode. The findings underscore the model's susceptibility to innovative attack strategies that can bypass existing safety mechanisms, calling attention to necessary advancements in protective measures and guidelines for safe AI deployment.

🎯 Why It's Interesting for AI Security Researchers

The paper's exploration of multi-stream perturbation attacks provides AI security researchers with valuable insights into how sophisticated reasoning capabilities in LLMs can inadvertently create new attack surfaces. As AI adoption grows, understanding these vulnerabilities and developing effective defensive strategies against such threats is crucial for ensuring the responsible use of AI technologies.

📚 Read the Full Paper