← Back to Library

MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs

Authors: Yilian Liu, Xiaojun Jia, Guoshun Nan, Jiuyang Lyu, Zhican Chen, Tao Guan, Shuyuan Luo, Zhongyi Zhai, Yang Liu

Published: 2026-02-28

arXiv ID: 2603.00565v1

Added to Library: 2026-03-03 03:00 UTC

Red Teaming

📄 Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable performance but remain vulnerable to jailbreak attacks that can induce harmful content and undermine their secure deployment. Previous studies have shown that introducing additional inference steps, which disrupt security attention, can make MLLMs more susceptible to being misled into generating malicious content. However, these methods rely on single-image masking or isolated visual cues, which only modestly extend reasoning paths and thus achieve limited effectiveness, particularly against strongly aligned commercial closed-source models. To address this problem, in this paper, we propose Multi-Image Dispersion and Semantic Reconstruction (MIDAS), a multimodal jailbreak framework that decomposes harmful semantics into risk-bearing subunits, disperses them across multiple visual clues, and leverages cross-image reasoning to gradually reconstruct the malicious intent, thereby bypassing existing safety mechanisms. The proposed MIDAS enforces longer and more structured multi-image chained reasoning, substantially increases the model's reliance on visual cues while delaying the exposure of malicious semantics and significantly reducing the model's security attention, thereby improving the performance of jailbreak against advanced MLLMs. Extensive experiments across different datasets and MLLMs demonstrate that the proposed MIDAS outperforms state-of-the-art jailbreak attacks for MLLMs and achieves an average attack success rate of 81.46% across 4 closed-source MLLMs. Our code is available at this [link](https://github.com/Winnie-Lian/MIDAS).

🔍 Key Points

  • Introduction of MIDAS framework for jailbreaking MLLMs by decomposing harmful semantics across multiple images and leveraging cross-image reasoning for malicious intent reconstruction.
  • Multimodal attacks on MLLMs outperforming state-of-the-art methods, achieving an attack success rate of 81.46% across closed-source models, indicating the effectiveness of multi-image dispersion and structured reasoning.
  • The use of game-based visual reasoning templates to disguise harmful semantic fragments as benign tasks, enhancing the robustness of the attack against safety filters.
  • Ablation studies demonstrating that each component of MIDAS, such as multi-image dispersion and persona-driven reasoning, significantly contributes to the overall success and effectiveness of the jailbreak strategy.
  • Methodological improvements in evading detection mechanisms of models, highlighting the effectiveness of the cross-modal reasoning approach in bypassing safety attention.

💡 Why This Paper Matters

This paper is important as it contributes significantly to the understanding and exploitation of vulnerabilities in multimodal large language models (MLLMs). By presenting the MIDAS framework, it provides a novel approach that emphasizes the interplay between visual and textual modalities to enhance jailbreak ability while also showcasing the challenges in current safety mechanisms deployed in MLLMs. The findings emphasize the critical need for improved defensive strategies in AI systems.

🎯 Why It's Interesting for AI Security Researchers

This paper is of particular interest to AI security researchers due to its exploration of multimodal attacks that leverage weaknesses in alignment mechanisms of large language models. It provides insights into potential vulnerabilities that could be exploited in real-world applications, raising awareness of the risks associated with deploying MLLMs in safety-critical environments. The proposed methods and results can inform the development of more resilient models and better safety protocols.

📚 Read the Full Paper