MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

Authors: Weiwei Qi, Shuo Shao, Wei Gu, Tianhang Zheng, Puning Zhao, Zhan Qin, Kui Ren

Published: 2025-08-18

arXiv ID: 2508.13048v1

Added to Library: 2025-08-19 04:00 UTC

Red Teaming

📄 Abstract

Large Language Models (LLMs) have exhibited remarkable capabilities but remain vulnerable to jailbreaking attacks, which can elicit harmful content from the models by manipulating the input prompts. Existing black-box jailbreaking techniques primarily rely on static prompts crafted with a single, non-adaptive strategy, or employ rigid combinations of several underperforming attack methods, which limits their adaptability and generalization. To address these limitations, we propose MAJIC, a Markovian adaptive jailbreaking framework that attacks black-box LLMs by iteratively combining diverse innovative disguise strategies. MAJIC first establishes a ``Disguise Strategy Pool'' by refining existing strategies and introducing several innovative approaches. To further improve the attack performance and efficiency, MAJIC formulate the sequential selection and fusion of strategies in the pool as a Markov chain. Under this formulation, MAJIC initializes and employs a Markov matrix to guide the strategy composition, where transition probabilities between strategies are dynamically adapted based on attack outcomes, thereby enabling MAJIC to learn and discover effective attack pathways tailored to the target model. Our empirical results demonstrate that MAJIC significantly outperforms existing jailbreak methods on prominent models such as GPT-4o and Gemini-2.0-flash, achieving over 90\% attack success rate with fewer than 15 queries per attempt on average.

🔍 Key Points

Introduction of MAJIC, a novel Markovian adaptive jailbreaking framework that iteratively combines strategies to exploit vulnerabilities in black-box LLMs.
Creation of a Disguise Strategy Pool that improves existing methods and adds innovative strategies to enhance adaptability and efficiency in attacks.
Modeling the strategy selection process as a Markov chain, allowing dynamic adjustment of strategy transitions based on attack outcomes, significantly improving attack success rates.
Extensive experiments validating MAJIC's superiority to existing jailbreak methods, achieving over 90% success on models like GPT-4o and Gemini-2.0-flash while reducing query costs substantially.
Dynamic updating of the Markov transition matrix through a Q-learning-inspired approach for real-time adaptability, enhancing effectiveness against diverse defenses.

💡 Why This Paper Matters

MAJIC offers a significant advancement in the methods available to exploit vulnerabilities in large language models, demonstrating high adaptability, efficiency, and success rates in bypassing safety mechanisms. The findings not only underline the need for more robust defenses in AI but also highlight the persistent challenges in ensuring the alignment of LLMs with safe operational protocols.

🎯 Why It's Interesting for AI Security Researchers

This paper is critical for AI security researchers as it presents a sophisticated approach to jailbreaking LLMs, highlighting the vulnerabilities in current models. The methodologies developed, particularly the use of a Markov chain for adaptive strategy selection, provide a framework that can inform future defensive strategies and improve understanding of the combat dynamics between models and potential exploiters.

MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper