HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

Authors: Sidhant Narula, Javad Rafiei Asl, Mohammad Ghasemigol, Eduardo Blanco, Daniel Takabi

Published: 2025-10-21

arXiv ID: 2510.18728v1

Added to Library: 2025-10-22 03:01 UTC

Red Teaming

📄 Abstract

Large Language Models (LLMs) remain vulnerable to multi-turn jailbreak attacks. We introduce HarmNet, a modular framework comprising ThoughtNet, a hierarchical semantic network; a feedback-driven Simulator for iterative query refinement; and a Network Traverser for real-time adaptive attack execution. HarmNet systematically explores and refines the adversarial space to uncover stealthy, high-success attack paths. Experiments across closed-source and open-source LLMs show that HarmNet outperforms state-of-the-art methods, achieving higher attack success rates. For example, on Mistral-7B, HarmNet achieves a 99.4% attack success rate, 13.9% higher than the best baseline. Index terms: jailbreak attacks; large language models; adversarial framework; query refinement.

🔍 Key Points

Introduction of HarmNet, a modular framework for adaptive multi-turn jailbreak attacks on LLMs, which includes ThoughtNet, a feedback-driven Simulator, and a Network Traverser for real-time execution.
Demonstrated superior performance of HarmNet in experiments, achieving up to 99.4% attack success rate on Mistral-7B, significantly outperforming state-of-the-art methods by up to 13.9%.
HarmNet employs a structured approach that systematically explores and refines the adversarial space, enhancing both the effectiveness and stealth of multi-turn jailbreak attacks.
Utilization of a hierarchical semantic network (ThoughtNet) to create diverse and contextually relevant multi-turn dialogue chains, contributing to more nuanced adversarial strategies.
Feedback-driven refinement process allows for real-time adjustments based on model responses, enabling more adaptive and effective exploitation of vulnerabilities.

💡 Why This Paper Matters

This paper provides a crucial advancement in the field of AI security by systematically addressing the vulnerabilities of large language models to multi-turn jailbreak attacks. The introduction of a modular framework like HarmNet not only enhances attack success rates but also promotes an understanding of how LLMs can be manipulated, paving the way for improved defenses against such adversarial strategies.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of great interest to AI security researchers because it highlights new vulnerabilities in large language models and proposes innovative methods to exploit these weaknesses. By demonstrating the effectiveness of multi-turn jailbreak attacks, it emphasizes the need for robust defenses and encourages further research into securing AI systems against adaptive adversarial threats. The findings urge practitioners to rethink the safety mechanisms in LLMs, thereby contributing significantly to the field of AI security.

HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper