NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks

Authors: Javad Rafiei Asl, Sidhant Narula, Mohammad Ghasemigol, Eduardo Blanco, Daniel Takabi

Published: 2025-10-03

arXiv ID: 2510.03417v1

Added to Library: 2025-10-07 04:03 UTC

Red Teaming

📄 Abstract

Large Language Models (LLMs) have revolutionized natural language processing but remain vulnerable to jailbreak attacks, especially multi-turn jailbreaks that distribute malicious intent across benign exchanges and bypass alignment mechanisms. Existing approaches often explore the adversarial space poorly, rely on hand-crafted heuristics, or lack systematic query refinement. We present NEXUS (Network Exploration for eXploiting Unsafe Sequences), a modular framework for constructing, refining, and executing optimized multi-turn attacks. NEXUS comprises: (1) ThoughtNet, which hierarchically expands a harmful intent into a structured semantic network of topics, entities, and query chains; (2) a feedback-driven Simulator that iteratively refines and prunes these chains through attacker-victim-judge LLM collaboration using harmfulness and semantic-similarity benchmarks; and (3) a Network Traverser that adaptively navigates the refined query space for real-time attacks. This pipeline uncovers stealthy, high-success adversarial paths across LLMs. On several closed-source and open-source LLMs, NEXUS increases attack success rate by 2.1% to 19.4% over prior methods. Code: https://github.com/inspire-lab/NEXUS

🔍 Key Points

Introduction of NEXUS framework, a modular approach for effectively executing multi-turn jailbreak attacks against LLMs.
Development of ThoughtNet for structured semantic exploration of adversarial intent and a feedback-driven Simulator for iterative query refinement.
Implementation of Network Traverser for real-time execution of refined query chains in jailbreak scenarios.
Empirical evaluations demonstrate NEXUS's higher attack success rates (2.1% to 19.4% improvement) compared to existing methods on various LLMs.
NEXUS outperforms state-of-the-art methods in terms of attack diversity and efficiency, showing potential for broader applicability in AI adversarial research.

💡 Why This Paper Matters

The NEXUS framework represents a significant advancement in the field of AI security by providing a systematic and modular approach to exploit vulnerabilities in LLMs through multi-turn attacks. Its structured methodology and empirical validation underscore the importance of understanding and mitigating adversarial risks associated with powerful language models, making it a valuable resource for both researchers and practitioners in AI safety and security.

🎯 Why It's Interesting for AI Security Researchers

This paper would attract the attention of AI security researchers as it introduces a novel framework (NEXUS) that effectively uncovers vulnerabilities in LLMs through sophisticated multi-turn jailbreak methods. The insights gained from applying this framework could inform the development of more robust defensive strategies, thereby contributing to the ongoing efforts to enhance the security and ethical reliability of AI systems.

NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper