← Back to Library

Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

Authors: Wei Zhao, Zhe Li, Yige Li, Jun Sun

Published: 2025-11-20

arXiv ID: 2511.16229v1

Added to Library: 2025-11-21 03:03 UTC

Red Teaming

📄 Abstract

Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in cross-modal understanding, but remain vulnerable to adversarial attacks through visual inputs despite robust textual safety mechanisms. These vulnerabilities arise from two core weaknesses: the continuous nature of visual representations, which allows for gradient-based attacks, and the inadequate transfer of text-based safety mechanisms to visual content. We introduce Q-MLLM, a novel architecture that integrates two-level vector quantization to create a discrete bottleneck against adversarial attacks while preserving multimodal reasoning capabilities. By discretizing visual representations at both pixel-patch and semantic levels, Q-MLLM blocks attack pathways and bridges the cross-modal safety alignment gap. Our two-stage training methodology ensures robust learning while maintaining model utility. Experiments demonstrate that Q-MLLM achieves significantly better defense success rate against both jailbreak attacks and toxic image attacks than existing approaches. Notably, Q-MLLM achieves perfect defense success rate (100\%) against jailbreak attacks except in one arguable case, while maintaining competitive performance on multiple utility benchmarks with minimal inference overhead. This work establishes vector quantization as an effective defense mechanism for secure multimodal AI systems without requiring expensive safety-specific fine-tuning or detection overhead. Code is available at https://github.com/Amadeuszhao/QMLLM.

🔍 Key Points

  • Introduction of Q-MLLM, a novel architecture utilizing two-level vector quantization to enhance the security of Multimodal Large Language Models (MLLMs) against adversarial attacks.
  • Demonstration of the effectiveness of discretizing visual representations at both pixel-patch and semantic levels which significantly mitigates vulnerabilities associated with continuous visual embeddings.
  • Achieved a perfect defense success rate (100%) against jailbreak attacks and high rates against toxic image attacks (75.9%), showcasing the robustness of the proposed architecture.
  • Implementation of an innovative two-stage training methodology that ensures robust learning without compromising utility, making this approach practical for real-world applications.
  • Comprehensive experimental evaluations showing Q-MLLM outperforms existing defense mechanisms in effectiveness while maintaining competitive performance on utility benchmarks.

💡 Why This Paper Matters

The Q-MLLM paper addresses critical vulnerabilities in multimodal large language models by providing a robust framework for enhancing model security. It successfully demonstrates the effectiveness of vector quantization as a defense mechanism against adversarial attacks. The findings are crucial for ensuring AI systems operate safely and ethically, especially in applications that fuse visual and textual data.

🎯 Why It's Interesting for AI Security Researchers

This paper would pique the interest of AI security researchers as it tackles the pressing issue of adversarial attacks on multimodal AI models. Given the increasing deployment of such models in sensitive applications, the research presents novel methods for hardening AI systems against security threats, making it highly relevant for advancements in safe AI development.

📚 Read the Full Paper