← Back to Library

Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search

Authors: Devang Kulshreshtha, Hang Su, Chinmay Hegde, Haohan Wang

Published: 2026-01-06

arXiv ID: 2601.02670v1

Added to Library: 2026-01-07 10:01 UTC

Red Teaming

📄 Abstract

Most jailbreak methods achieve high attack success rates (ASR) but require attacker LLMs to craft adversarial queries and/or demand high query budgets. These resource limitations make jailbreaking expensive, and the queries generated by attacker LLMs often consist of non-interpretable random prefixes. This paper introduces Lexical Anchor Tree Search (), addressing these limitations through an attacker-LLM-free method that operates purely via lexical anchor injection. LATS reformulates jailbreaking as a breadth-first tree search over multi-turn dialogues, where each node incrementally injects missing content words from the attack goal into benign prompts. Evaluations on AdvBench and HarmBench demonstrate that LATS achieves 97-100% ASR on latest GPT, Claude, and Llama models with an average of only ~6.4 queries, compared to 20+ queries required by other methods. These results highlight conversational structure as a potent and under-protected attack surface, while demonstrating superior query efficiency in an era where high ASR is readily achievable. Our code will be released to support reproducibility.

🔍 Key Points

  • Introduction of Lexical Anchor Tree Search (LATS), a novel method that allows multi-turn jailbreaking of aligned language models without the need for an attacker LLM, resulting in significant efficiency improvements.
  • LATS reformulates jailbreaking into a breadth-first search over multi-turn dialogues, which enables the injection of key lexical anchors incrementally, achieving high attack success rates (ASR) with fewer queries (average of 6.4 queries) compared to existing methods (which often require 20+ queries).
  • Experimental evaluations on multiple benchmarks (AdvBench and HarmBench) demonstrate that LATS achieves 97-100% ASR across various language models (including GPT, Claude, and Llama) and outperforms existing single-turn and multi-turn jailbreak methods by at least 10% in ASR.
  • LATS shows resilience against multiple defense mechanisms like In-Context Demonstrations and PromptGuard, indicating that conversational structures can be exploited as under-protected attack surfaces in LLMs.

💡 Why This Paper Matters

This paper presents a critical advancement in the field of AI security, specifically regarding large language models, by introducing an efficient and effective method for multi-turn jailbreaking via LATS. The ability to achieve high attack success rates with significantly fewer queries marks a pivotal shift in the strategy used by attackers, thereby highlighting new vulnerabilities in aligned models and the need for revising current defensive frameworks against such sophisticated attacks.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper particularly valuable as it sheds light on a novel attack methodology that exploits conversational dynamics. The insights gained from LATS can inform the development of more robust security measures and defenses against multi-turn attacks in LLMs, guiding future research aimed at enhancing the safety and alignment of AI systems.

📚 Read the Full Paper