← Back to Library

Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models

Authors: Badhan Chandra Das, Md Tasnim Jawad, Joaquin Molto, M. Hadi Amini, Yanzhao Wu

Published: 2026-01-08

arXiv ID: 2601.05339v1

Added to Library: 2026-01-12 03:01 UTC

Red Teaming

πŸ“„ Abstract

In recent years, the security vulnerabilities of Multi-modal Large Language Models (MLLMs) have become a serious concern in the Generative Artificial Intelligence (GenAI) research. These highly intelligent models, capable of performing multi-modal tasks with high accuracy, are also severely susceptible to carefully launched security attacks, such as jailbreaking attacks, which can manipulate model behavior and bypass safety constraints. This paper introduces MJAD-MLLMs, a holistic framework that systematically analyzes the proposed Multi-turn Jailbreaking Attacks and multi-LLM-based defense techniques for MLLMs. In this paper, we make three original contributions. First, we introduce a novel multi-turn jailbreaking attack to exploit the vulnerabilities of the MLLMs under multi-turn prompting. Second, we propose a novel fragment-optimized and multi-LLM defense mechanism, called FragGuard, to effectively mitigate jailbreaking attacks in the MLLMs. Third, we evaluate the efficacy of the proposed attacks and defenses through extensive experiments on several state-of-the-art (SOTA) open-source and closed-source MLLMs and benchmark datasets, and compare their performance with the existing techniques.

πŸ” Key Points

  • Introduction of MJAD-MLLM framework, which provides a systematic analysis of multi-turn jailbreaking attacks targeting multi-modal large language models.
  • Development of a novel multi-turn jailbreaking attack that exploits vulnerabilities in MLLMs under sustained interaction, achieving high attack success rates.
  • Proposition of FragGuard, a fragmentation-optimized defense technique utilizing multiple LLMs to assess and mitigate toxicity in model responses without requiring retraining.
  • Extensive evaluation demonstrating the efficacy of both the jailbreaking attack and FragGuard defense across multiple state-of-the-art models and benchmark datasets.
  • Contribution to the understanding of security risks in MLLMs by comparing results against existing techniques and highlighting gaps in current defenses.

πŸ’‘ Why This Paper Matters

This paper presents significant advancements in the security analysis of multi-modal large language models (MLLMs) through novel attack and defense methodologies. The introduction of the MJAD-MLLM framework and the FragGuard defense mechanism showcases a structured approach to understanding and addressing the vulnerabilities within these complex systems. Given the increasing deployment of MLLMs in various applications, the findings underscore the importance of robust defense strategies, making this research especially relevant in today’s rapidly evolving AI landscape.

🎯 Why It's Interesting for AI Security Researchers

This paper is of particular interest to AI security researchers as it addresses urgent security challenges posed by jailbreaking attacks on MLLMs, which can lead to the generation of inappropriate or harmful content. The novel multi-turn attack method and the proposed defenses offer critical insights into the vulnerabilities of these technologies, emphasizing the need for continuous improvement in security measures. Additionally, the framework and methodologies outlined here provide a foundation for further research in developing more effective defenses against emerging threats in the field.

πŸ“š Read the Full Paper