Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

Authors: In Chong Choi, Jiacheng Zhang, Feng Liu, Yiliao Song

Published: 2026-02-16

arXiv ID: 2602.14399v1

Added to Library: 2026-02-17 04:01 UTC

Red Teaming

📄 Abstract

Multi-turn jailbreak attacks are effective against text-only large language models (LLMs) by gradually introducing malicious content across turns. When extended to large vision-language models (LVLMs), we find that naively adding visual inputs can cause existing multi-turn jailbreaks to be easily defended. For example, overly malicious visual input will easily trigger the defense mechanism of safety-aligned LVLMs, making the response more conservative. To address this, we propose MAPA: a multi-turn adaptive prompting attack that 1) at each turn, alternates text-vision attack actions to elicit the most malicious response; and 2) across turns, adjusts the attack trajectory through iterative back-and-forth refinement to gradually amplify response maliciousness. This two-level design enables MAPA to consistently outperform state-of-the-art methods, improving attack success rates by 11-35% on recent benchmarks against LLaVA-V1.6-Mistral-7B, Qwen2.5-VL-7B-Instruct, Llama-3.2-Vision-11B-Instruct and GPT-4o-mini.

🔍 Key Points

Introduction of MAPA (Multi-Turn Adaptive Prompting Attack), a novel method for conducting multi-turn jailbreak attacks on large vision-language models (LVLMs) by alternating attack actions and adjusting attack trajectories across turns.
Demonstration of how traditional single-turn and naive multi-turn attacks are insufficient in breaking through safety mechanisms in LVLMs, highlighting the necessity for a more sophisticated approach like MAPA.
Significant improvements in attack success rates (11-35%) over state-of-the-art methods, showcasing MAPA's effectiveness across multiple LVLMs including LLaVA-V1.6-Mistral-7B, GPT-4o-mini, and others.
Thorough empirical validation of MAPA through extensive experiments on benchmark datasets (HarmBench, JailbreakBench) which demonstrate consistent performance superiority compared to existing approaches.
Incorporation of adaptive reflection mechanisms in the attack approach to learn from past failures, enhancing the robustness and effectiveness of red-teaming attempts against LVLMs.

💡 Why This Paper Matters

This paper introduces a critical advancement in the field of AI security by presenting MAPA, a sophisticated approach to jailbreak attacks on large vision-language models. Understanding how to successfully manipulate these models has significant implications for improving their safety and resistance to adversarial prompts. As the field evolves, developing methods like MAPA will be pivotal in testing the robustness of AI systems, ensuring they can be relied upon in sensitive applications.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant for AI security researchers as it addresses the vulnerabilities of advanced vision-language models, which are becoming more common in various applications. By understanding how adversarial attacks can successfully manipulate these models, researchers can explore defenses and safety measures that can mitigate such risks, thereby contributing to the overall security of AI systems.

Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper