← Back to Library

When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life

Authors: Xinyue Lou, Jinan Xu, Jingyi Yin, Xiaolong Wang, Zhaolu Kang, Youwei Liao, Yixuan Wang, Xiangyu Shi, Fengran Mo, Su Yao, Kaiyu Huang

Published: 2026-01-07

arXiv ID: 2601.04043v1

Added to Library: 2026-01-08 04:00 UTC

Safety

📄 Abstract

As Multimodal Large Language Models (MLLMs) become an indispensable assistant in human life, the unsafe content generated by MLLMs poses a danger to human behavior, perpetually overhanging human society like a sword of Damocles. To investigate and evaluate the safety impact of MLLMs responses on human behavior in daily life, we introduce SaLAD, a multimodal safety benchmark which contains 2,013 real-world image-text samples across 10 common categories, with a balanced design covering both unsafe scenarios and cases of oversensitivity. It emphasizes realistic risk exposure, authentic visual inputs, and fine-grained cross-modal reasoning, ensuring that safety risks cannot be inferred from text alone. We further propose a safety-warning-based evaluation framework that encourages models to provide clear and informative safety warnings, rather than generic refusals. Results on 18 MLLMs demonstrate that the top-performing models achieve a safe response rate of only 57.2% on unsafe queries. Moreover, even popular safety alignment methods limit effectiveness of the models in our scenario, revealing the vulnerabilities of current MLLMs in identifying dangerous behaviors in daily life. Our dataset is available at https://github.com/xinyuelou/SaLAD.

🔍 Key Points

  • Introduction of SaLAD, a multimodal safety benchmark designed to evaluate the responses of Multimodal Large Language Models (MLLMs) in everyday scenarios, consisting of 2,013 image-text pairs across 10 categories, addressing both unsafe behaviors and oversensitivity.
  • Developed a safety-warning-based evaluation framework that requires models not just to refuse unsafe queries but to provide informative safety warnings, thereby enhancing user guidance.
  • Evaluation of 18 MLLMs revealed that even the top models only achieved a safety response rate of 57.2% for unsafe queries, highlighting the inherent vulnerabilities of these models in recognizing dangers in daily life.
  • The study demonstrated that existing safety alignment methods are ineffective in improving the safety performance of MLLMs, particularly in fine-grained risk detection in multimodal contexts.
  • The findings emphasize the critical need for improved multimodal safety mechanisms in AI systems to ensure reliable interactions in real-world applications.

💡 Why This Paper Matters

The paper's contributions are essential in addressing the pressing concerns of safety in AI-powered applications as MLLMs continue to be integrated into everyday human activities. By establishing a comprehensive benchmark and evaluation framework, the authors underline the deficiencies of current models, paving the way for future research to enhance the safety mechanisms of AI systems. This endeavor is crucial for fostering user trust and ensuring responsible AI deployment in sensitive domains.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper relevant as it addresses a critical intersection of AI safety and security, focusing specifically on the risks posed by MLLMs in daily human interactions. The introduction of the SaLAD benchmark provides a new tool for evaluating model vulnerabilities, which can inform security practices and policy formulations in AI development. This work not only highlights existing risks but also pushes for advancements in safety alignment strategies, vital for protecting users from potential hazards arising from AI-generated content.

📚 Read the Full Paper