← Back to Library

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages

Authors: Yujia Hu, Ming Shan Hee, Preslav Nakov, Roy Ka-Wei Lee

Published: 2025-09-18

arXiv ID: 2509.15260v2

Added to Library: 2025-09-24 01:01 UTC

Safety

📄 Abstract

The advancement of Large Language Models (LLMs) has transformed natural language processing; however, their safety mechanisms remain under-explored in low-resource, multilingual settings. Here, we aim to bridge this gap. In particular, we introduce \textsf{SGToxicGuard}, a novel dataset and evaluation framework for benchmarking LLM safety in Singapore's diverse linguistic context, including Singlish, Chinese, Malay, and Tamil. SGToxicGuard adopts a red-teaming approach to systematically probe LLM vulnerabilities in three real-world scenarios: \textit{conversation}, \textit{question-answering}, and \textit{content composition}. We conduct extensive experiments with state-of-the-art multilingual LLMs, and the results uncover critical gaps in their safety guardrails. By offering actionable insights into cultural sensitivity and toxicity mitigation, we lay the foundation for safer and more inclusive AI systems in linguistically diverse environments.\footnote{Link to the dataset: https://github.com/Social-AI-Studio/SGToxicGuard.} \textcolor{red}{Disclaimer: This paper contains sensitive content that may be disturbing to some readers.}

🔍 Key Points

  • Introduction of SGToxicGuard, a novel dataset and evaluation framework tailored for assessing LLM safety in Singapore's multilingual context, specifically targeting low-resource languages such as Singlish, Malay, and Tamil.
  • Adoption of a red-teaming methodology to systematically probe LLM vulnerabilities across three real-world inspired tasks: Toxic Conversation, Toxic Question-Answering, and Toxic Tweet Composition.
  • Extensive evaluation showing significant gaps in safety mechanisms of existing LLMs when handling low-resource languages, with higher rates of toxic content generation observed compared to English.
  • The research highlights the importance of contextual and cultural sensitivity in AI safety evaluations and suggests actionable insights for mitigating biases and toxicity in multilingual AI deployments.
  • Contributions emphasize the necessity for tailored methodologies and datasets to address the unique challenges posed by low-resource languages in AI systems.

💡 Why This Paper Matters

This paper is crucial for advancing the understanding of how large language models (LLMs) perform in low-resource, multilingual settings. By revealing vulnerabilities in LLM safety mechanisms, particularly in Singapore's diverse linguistic landscape, it contributes to the ongoing discourse on AI ethics and responsible deployment. Through its innovative framework and the introduction of SGToxicGuard, the research lays the groundwork for developing safer and more inclusive AI systems in linguistically diverse environments, ultimately benefitting users across different cultural contexts.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper presents significant insights into the safety and robustness of LLMs in handling toxic content in low-resource languages. The novel methodology of red-teaming provides a practical framework for evaluating and improving the safety of AI models, emphasizing the need for context-aware safeguards. Additionally, the demonstrated shortcomings of existing models in managing toxic generation highlight critical areas for further research and development, making it a pertinent resource for advancing AI security practices.

📚 Read the Full Paper