← Back to Library

Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization

Authors: Jiwei Guan, Haibo Jin, Haohan Wang

Published: 2026-01-05

arXiv ID: 2601.01747v2

Added to Library: 2026-01-09 03:02 UTC

Red Teaming

📄 Abstract

Recent advancements in Large Vision-Language Models (LVLMs) have shown groundbreaking capabilities across diverse multimodal tasks. However, these models remain vulnerable to adversarial jailbreak attacks, where adversaries craft subtle perturbations to bypass safety mechanisms and trigger harmful outputs. Existing white-box attacks methods require full model accessibility, suffer from computing costs and exhibit insufficient adversarial transferability, making them impractical for real-world, black-box settings. To address these limitations, we propose a black-box jailbreak attack on LVLMs via Zeroth-Order optimization using Simultaneous Perturbation Stochastic Approximation (ZO-SPSA). ZO-SPSA provides three key advantages: (i) gradient-free approximation by input-output interactions without requiring model knowledge, (ii) model-agnostic optimization without the surrogate model and (iii) lower resource requirements with reduced GPU memory consumption. We evaluate ZO-SPSA on three LVLMs, including InstructBLIP, LLaVA and MiniGPT-4, achieving the highest jailbreak success rate of 83.0% on InstructBLIP, while maintaining imperceptible perturbations comparable to white-box methods. Moreover, adversarial examples generated from MiniGPT-4 exhibit strong transferability to other LVLMs, with ASR reaching 64.18%. These findings underscore the real-world feasibility of black-box jailbreaks and expose critical weaknesses in the safety mechanisms of current LVLMs

🔍 Key Points

  • Introduction of ZO-SPSA: The paper proposes a novel black-box adversarial jailbreak attack method named Zeroth-Order Simultaneous Perturbation Stochastic Approximation (ZO-SPSA), which enables the crafting of adversarial inputs without requiring gradient information from the model.
  • Model-Agnostic Optimization: ZO-SPSA is designed to be model-agnostic, allowing it to effectively target various large vision-language models (LVLMs) including InstructBLIP, LLaVA, and MiniGPT-4, without needing knowledge of their architectures or internal parameters.
  • High Attack Success Rate: The method achieves a jailbreak success rate of 83.0% on InstructBLIP with imperceptible perturbations, demonstrating its effectiveness in bypassing safety mechanisms compared to traditional white-box methods.
  • Strong Transferability: Adversarial examples generated using MiniGPT-4 show a transferability rate of 64.18% to other LVLMs, highlighting the method's potential for real-world application across different models.
  • Reduced Resource Requirements: ZO-SPSA significantly lowers GPU memory consumption and computational efforts compared to white-box methods, making it a practical choice for researchers and attackers in black-box settings.

💡 Why This Paper Matters

This paper introduces a significant advancement in black-box adversarial attacks on large vision-language models through the ZO-SPSA technique. By effectively bypassing safety mechanisms without needing model access, the method demonstrates a clear vulnerability in current LVLMs, making it a crucial contribution to AI security research. The high attack success rates and strong transferability of adversarial examples signal an urgent need for improved safety measures and defense mechanisms to protect against such attacks.

🎯 Why It's Interesting for AI Security Researchers

The proposed ZO-SPSA attack method presents a breakthrough in understanding the vulnerabilities of large vision-language models, making it highly pertinent for AI security researchers. As these models become increasingly integrated into applications with real-world consequences, the findings underscore the necessity for robust defenses against adversarial attacks. Additionally, the model-agnostic nature and practical efficiency of ZO-SPSA provide critical insights for those developing security protocols and AI safety mechanisms.

📚 Read the Full Paper