← Back to Library

Adversarial Confusion Attack: Disrupting Multimodal Large Language Models

Authors: Jakub Hoscilowicz, Artur Janicki

Published: 2025-11-25

arXiv ID: 2511.20494v2

Added to Library: 2025-12-01 04:00 UTC

Red Teaming

📄 Abstract

We introduce the Adversarial Confusion Attack, a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted misclassification, the goal is to induce systematic disruption that makes the model generate incoherent or confidently incorrect outputs. Applications include embedding adversarial images into websites to prevent MLLM-powered agents from operating reliably. The proposed attack maximizes next-token entropy using a small ensemble of open-source MLLMs. In the white-box setting, we show that a single adversarial image can disrupt all models in the ensemble, both in the full-image and adversarial CAPTCHA settings. Despite relying on a basic adversarial technique (PGD), the attack generates perturbations that transfer to both unseen open-source (e.g., Qwen3-VL) and proprietary (e.g., GPT-5.1) models.

🔍 Key Points

  • Introduction of the Adversarial Confusion Attack, targeting the systematic disruption of multimodal large language models (MLLMs) by maximizing next-token entropy.
  • Demonstrated that a single adversarial image can cause a confusion effect across various models, highlighting the vulnerability of MLLMs to this type of attack.
  • Characterization of five distinct confusion modes experienced by models under attack, outlining the spectrum from blindness to complete semantic collapse.
  • Evaluation of transferability of the adversarial attack to both unseen open-source and proprietary models, showcasing the broad applicability of the method.
  • Discussion of practical implications, including the potential use of adversarial images embedded in websites to impede the functionality of MLLM-powered AI agents.

💡 Why This Paper Matters

This paper presents a novel threat to multimodal large language models, showing that adversarial techniques can be effectively utilized to create confusion in model outputs. The development of the Adversarial Confusion Attack is significant not only for understanding the vulnerabilities of current AI models but also for designing defenses against potential misuse, making it a crucial read for both AI developers and security researchers.

🎯 Why It's Interesting for AI Security Researchers

The findings in this paper are particularly relevant for AI security researchers as they expose critical vulnerabilities in multimodal large language models. Understanding these weaknesses can drive the development of stronger security protocols and preventative measures against misuse and adversarial attacks, which are increasingly relevant in a world where AI systems are integrated into numerous applications.

📚 Read the Full Paper