← Back to Library

ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack

Authors: Xingwei Lin, Wenhao Lin, Sicong Cao, Jiahao Yu, Renke Huang, Lei Xue, Chunming Wu

Published: 2026-01-28

arXiv ID: 2601.20903v1

Added to Library: 2026-01-30 03:01 UTC

Red Teaming

📄 Abstract

Multi-turn jailbreak attacks have emerged as a critical threat to Large Language Models (LLMs), bypassing safety mechanisms by progressively constructing adversarial contexts from scratch and incrementally refining prompts. However, existing methods suffer from the inefficiency of incremental context construction that requires step-by-step LLM interaction, and often stagnate in suboptimal regions due to surface-level optimization. In this paper, we characterize the Intent-Context Coupling phenomenon, revealing that LLM safety constraints are significantly relaxed when a malicious intent is coupled with a semantically congruent context pattern. Driven by this insight, we propose ICON, an automated multi-turn jailbreak framework that efficiently constructs an authoritative-style context via prior-guided semantic routing. Specifically, ICON first routes the malicious intent to a congruent context pattern (e.g., Scientific Research) and instantiates it into an attack prompt sequence. This sequence progressively builds the authoritative-style context and ultimately elicits prohibited content. In addition, ICON incorporates a Hierarchical Optimization Strategy that combines local prompt refinement with global context switching, preventing the attack from stagnating in ineffective contexts. Experimental results across eight SOTA LLMs demonstrate the effectiveness of ICON, achieving a state-of-the-art average Attack Success Rate (ASR) of 97.1\%. Code is available at https://github.com/xwlin-roy/ICON.

🔍 Key Points

  • Proposing ICON, a novel automated framework for multi-turn jailbreak attacks that leverages Intent-Context Coupling to efficiently construct authoritative-style contexts for eliciting prohibited content from LLMs.
  • Experimental results demonstrated that ICON achieves a remarkable Attack Success Rate (ASR) of 97.1% across multiple state-of-the-art LLMs, outperforming existing methodologies significantly.
  • The framework incorporates a Hierarchical Optimization Strategy that combines local prompt refinements with global context switching, addressing stagnation in less effective contexts and enabling robust multi-turn interactions.

💡 Why This Paper Matters

The ICON framework presents a groundbreaking approach to tackling multi-turn jailbreak attacks against LLMs, addressing critical inefficiencies in existing methods and providing robust strategies for navigating semantic complexities in dialogue systems. This paper highlights the importance of understanding and exploiting the relationship between intent and context to enhance attack success rates while emphasizing the need for improved safety protocols in AI systems.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant for AI security researchers due to its in-depth analysis of vulnerabilities within LLMs and the introduction of a systematic methodology to bypass existing safety measures. The insights gained from studying the Intent-Context Coupling phenomenon not only contribute to advancing knowledge in AI security but also drive the sector towards developing more resilient models against adversarial attacks.

📚 Read the Full Paper