← Back to Library

LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context

Authors: Yudong Li, Zhongliang Yang, Kejiang Chen, Wenxuan Wang, Tianxin Zhang, Sifang Wan, Kecheng Wang, Haitian Li, Xu Wang, Lefan Cheng, Youdan Yang, Baocheng Chen, Ziyu Liu, Yufei Sun, Liyan Wu, Wenya Wen, Xingchi Gu, Peiru Yang

Published: 2025-11-04

arXiv ID: 2511.02366v1

Added to Library: 2025-11-05 05:02 UTC

Safety

📄 Abstract

In this work, we propose LiveSecBench, a dynamic and continuously updated safety benchmark specifically for Chinese-language LLM application scenarios. LiveSecBench evaluates models across six critical dimensions (Legality, Ethics, Factuality, Privacy, Adversarial Robustness, and Reasoning Safety) rooted in the Chinese legal and social frameworks. This benchmark maintains relevance through a dynamic update schedule that incorporates new threat vectors, such as the planned inclusion of Text-to-Image Generation Safety and Agentic Safety in the next update. For now, LiveSecBench (v251030) has evaluated 18 LLMs, providing a landscape of AI safety in the context of Chinese language. The leaderboard is publicly accessible at https://livesecbench.intokentech.cn/.

🔍 Key Points

  • Introduction of LiveSecBench: A novel benchmark tailored for evaluating AI safety in Chinese-language contexts, addressing unique legal and cultural challenges.
  • Evaluation across six critical dimensions (Legality, Ethics, Factuality, Privacy, Adversarial Robustness, and Reasoning Safety) that reflect the Chinese socio-legal landscape.
  • Dynamic and continuous update mechanism to ensure the relevance of the benchmark, allowing it to evolve with emerging threats and challenges in the AI space.
  • Comprehensive dataset construction validated by native speakers and experts to cater to cultural nuances, ensuring robust assessments of LLMs.
  • Utilization of an ELO rating system for model evaluation, promoting a competitive and dynamic assessment method compared to static benchmarks.

💡 Why This Paper Matters

This paper presents a significant advancement in AI safety assessment by establishing LiveSecBench, a culturally relevant and dynamically updated benchmark for Chinese-language LLMs. It bridges a crucial gap in existing benchmarks that predominantly serve English models, thereby ensuring that safety assessments reflect the specific legal and social contexts relevant to Chinese users.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant to AI security researchers as it addresses the growing need for robust safety measures in AI model evaluations, specifically in contexts that are often overlooked. With the rapid deployment of LLMs in various applications, understanding their vulnerabilities and ensuring their safe deployment in specific cultural contexts is critical. LiveSecBench offers a framework that not only evaluates existing models but also adapts to new threats, making it a valuable tool for ongoing research and development in AI safety.

📚 Read the Full Paper