← Back to Library

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages

Authors: Yujia Hu, Ming Shan Hee, Preslav Nakov, Roy Ka-Wei Lee

Published: 2025-09-18

arXiv ID: 2509.15260v1

Added to Library: 2025-09-22 04:01 UTC

Safety

📄 Abstract

The advancement of Large Language Models (LLMs) has transformed natural language processing; however, their safety mechanisms remain under-explored in low-resource, multilingual settings. Here, we aim to bridge this gap. In particular, we introduce \textsf{SGToxicGuard}, a novel dataset and evaluation framework for benchmarking LLM safety in Singapore's diverse linguistic context, including Singlish, Chinese, Malay, and Tamil. SGToxicGuard adopts a red-teaming approach to systematically probe LLM vulnerabilities in three real-world scenarios: \textit{conversation}, \textit{question-answering}, and \textit{content composition}. We conduct extensive experiments with state-of-the-art multilingual LLMs, and the results uncover critical gaps in their safety guardrails. By offering actionable insights into cultural sensitivity and toxicity mitigation, we lay the foundation for safer and more inclusive AI systems in linguistically diverse environments.\footnote{Link to the dataset: https://github.com/Social-AI-Studio/SGToxicGuard.} \textcolor{red}{Disclaimer: This paper contains sensitive content that may be disturbing to some readers.}

🔍 Key Points

  • Introduction of SGToxicGuard, a novel dataset for evaluating LLM safety in Singapore's low-resource languages, including Singlish, Chinese, Malay, and Tamil.
  • Implementation of a red-teaming framework to systematically explore LLM vulnerabilities through three real-world scenarios: toxic conversation, toxic question-answering, and toxic content composition.
  • Extensive experimental results that expose significant safety gaps in state-of-the-art multilingual LLMs, particularly when interacting with low-resource languages.
  • Demonstration of how existing LLMs display heightened biases and toxic content generation in low-resource languages compared to high-resource languages, highlighting the need for targeted safety measures.
  • Actionable recommendations provided for improving AI safety mechanisms and fostering inclusive AI systems in linguistically diverse environments.

💡 Why This Paper Matters

This paper addresses a critical gap in the evaluation of LLM safety mechanisms for low-resource languages, specifically within Singapore's multilingual context. By establishing the SGToxicGuard framework and revealing significant model vulnerabilities, the authors underscore the necessity for enhanced AI safety protocols designed to cater to diverse linguistic and cultural environments. The findings emphasize the importance of contextual understanding in AI applications, which is pivotal for ethical deployment and societal acceptance of AI technologies.

🎯 Why It's Interesting for AI Security Researchers

This paper is of keen interest to AI security researchers due to its focus on the vulnerabilities of Large Language Models in low-resource multilingual settings. The introduction of a red-teaming approach for evaluating AI safety provides new methodologies to uncover biases and toxic content generation in diverse linguistic contexts, which is critical for developing safer AI systems. Additionally, the practical implications of the findings can influence future practices in AI development, regulatory policies, and ethical standards, making it a valuable resource for ongoing discussions in the field of AI safety and ethics.

📚 Read the Full Paper