← Back to Library

Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

Authors: Yuyan Bu, Xiaohao Liu, ZhaoXing Ren, Yaodong Yang, Juntao Dai

Published: 2026-02-18

arXiv ID: 2602.16660v1

Added to Library: 2026-02-19 03:02 UTC

Safety

📄 Abstract

The widespread deployment of large language models (LLMs) across linguistic communities necessitates reliable multilingual safety alignment. However, recent efforts to extend alignment to other languages often require substantial resources, either through large-scale, high-quality supervision in the target language or through pairwise alignment with high-resource languages, which limits scalability. In this work, we propose a resource-efficient method for improving multilingual safety alignment. We introduce a plug-and-play Multi-Lingual Consistency (MLC) loss that can be integrated into existing monolingual alignment pipelines. By improving collinearity between multilingual representation vectors, our method encourages directional consistency at the multilingual semantic level in a single update. This allows simultaneous alignment across multiple languages using only multilingual prompt variants without requiring additional response-level supervision in low-resource languages. We validate the proposed method across different model architectures and alignment paradigms, and demonstrate its effectiveness in enhancing multilingual safety with limited impact on general model utility. Further evaluation across languages and tasks indicates improved cross-lingual generalization, suggesting the proposed approach as a practical solution for multilingual consistency alignment under limited supervision.

🔍 Key Points

  • Introduction of a Multi-Lingual Consistency (MLC) loss to enhance multilingual safety alignment in large language models (LLMs) using minimal resources.
  • MLC facilitates simultaneous alignment of multiple languages without requiring additional response-level supervision, making it a scalable method for low-resource settings.
  • Extensive experiments demonstrate that MLC significantly improves safety performance across languages, particularly for low-resource languages, while maintaining general model utility.
  • The approach encourages representations of semantically equivalent inputs in different languages to have a consistent interpretation, leveraging collinearity in representation spaces.

💡 Why This Paper Matters

This paper is crucial as it addresses the significant challenge of ensuring safety across multilingual LLMs without relying on exhaustive data or resources. By proposing a novel MLC loss that stands to provide robust safety alignment, it presents an efficient alternative to traditional methodologies which often fall short in low-resource language contexts. The findings underline its practical applicability in real-world scenarios, where equitable access to safe language technology is increasingly necessary.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper presents a cutting-edge approach to safety in multilingual contexts, highlighting a framework that mitigates risks associated with language models in diverse linguistic settings. The MLC methodology offers a pathway to enhance safety alignment broadly and sustainably, making it relevant for those focused on the ethical deployment and safety of AI systems.

📚 Read the Full Paper