← Back to Library

CrossCheck-Bench: Diagnosing Compositional Failures in Multimodal Conflict Resolution

Authors: Baoliang Tian, Yuxuan Si, Jilong Wang, Lingyao Li, Zhongyuan Bao, Zineng Zhou, Tao Wang, Sixu Li, Ziyao Xu, Mingze Wang, Zhouzhuo Zhang, Zhihao Wang, Yike Yun, Ke Tian, Ning Yang, Minghui Qiu

Published: 2025-11-19

arXiv ID: 2511.21717v1

Added to Library: 2025-12-01 03:02 UTC

📄 Abstract

Multimodal Large Language Models are primarily trained and evaluated on aligned image-text pairs, which leaves their ability to detect and resolve real-world inconsistencies largely unexplored. In open-domain applications visual and textual cues often conflict, requiring models to perform structured reasoning beyond surface-level alignment. We introduce CrossCheck-Bench, a diagnostic benchmark for evaluating contradiction detection in multimodal inputs. The benchmark adopts a hierarchical task framework covering three levels of reasoning complexity and defines seven atomic capabilities essential for resolving cross-modal inconsistencies. CrossCheck-Bench includes 15k question-answer pairs sourced from real-world artifacts with synthetically injected contradictions. The dataset is constructed through a multi-stage annotation pipeline involving more than 450 expert hours to ensure semantic validity and calibrated difficulty across perception, integration, and reasoning. We evaluate 13 state-of-the-art vision-language models and observe a consistent performance drop as tasks shift from perceptual matching to logical contradiction detection. Most models perform well on isolated entity recognition but fail when multiple clues must be synthesized for conflict reasoning. Capability-level analysis further reveals uneven skill acquisition, especially in tasks requiring multi-step inference or rule-based validation. Additional probing shows that conventional prompting strategies such as Chain-of-Thought and Set-of-Mark yield only marginal gains. By contrast, methods that interleave symbolic reasoning with grounded visual processing achieve more stable improvements. These results highlight a persistent bottleneck in multimodal reasoning and suggest new directions for building models capable of robust cross-modal verification.

🔍 Key Points

  • Introduction of Label Disguise Defense (LDD) as a new lightweight and model-agnostic defense mechanism against prompt injection attacks in large language models (LLMs) used for sentiment classification.
  • Effective concealment of true labels via the use of semantically transformed or unrelated alias labels, allowing models to learn new label semantics through few-shot demonstrations.
  • Evaluation of LDD's effectiveness across nine state-of-the-art models, demonstrating that LDD can restore model accuracy lost due to adversarial attacks with some alias label pairs outperforming baseline accuracy without defense.
  • A linguistic analysis showing that semantically aligned alias labels yield stronger robustness against injection compared to unrelated symbols, emphasizing the importance of label semantics in model training.
  • The paper highlights the limitations of existing prompt injection defenses and positions LDD as an efficient alternative that does not require model retraining or access to internal mechanisms.

💡 Why This Paper Matters

This paper is significant as it presents a novel approach to defend against prompt injection attacks, which pose a substantial risk to the reliability and accuracy of large language models in crucial tasks like sentiment classification. By demonstrating the effectiveness of semantically disguised labels, the research offers a promising direction for enhancing the security and robustness of AI systems without necessitating extensive model modifications or retraining.

🎯 Why It's Interesting for AI Security Researchers

This research is particularly relevant for AI security researchers as it addresses a critical vulnerability in AIs that are increasingly deployed in sensitive applications. Understanding and mitigating the risks associated with prompt injections can help in developing more resilient AI systems, improving their trustworthiness in real-world scenarios. The exploration of semantics in defense strategies also opens new avenues for securing machine learning models against evolving adversarial strategies.

📚 Read the Full Paper