AnswerCarefully: A Dataset for Improving the Safety of Japanese LLM Output

Authors: Hisami Suzuki, Satoru Katsumata, Takashi Kodama, Tetsuro Takahashi, Kouta Nakayama, Satoshi Sekine

Published: 2025-06-03

arXiv ID: 2506.02372v1

Added to Library: 2025-06-04 04:01 UTC

Safety

📄 Abstract

In this paper we present AnswerCarefully, a dataset for promoting the safety and appropriateness of Japanese LLM outputs. The dataset consists of 1,800 pairs of questions and reference answers, where the questions require special attention in answering. It covers a wide range of risk categories established in prior English-language datasets, but the data samples are original in that they are manually created to reflect the socio-cultural context of LLM usage in Japan. We show that using this dataset for instruction to fine-tune a Japanese LLM led to improved output safety without compromising the utility of general responses. We also report the results of a safety evaluation of 12 Japanese LLMs using this dataset as a benchmark. Finally, we describe the latest update on the dataset which provides English translations and annotations of the questions, aimed at facilitating the derivation of similar datasets in different languages and regions.

🔍 Key Points

Introduction of the AnswerCarefully (AC) dataset, consisting of 1,800 pairs of culturally relevant questions and reference answers aimed at improving the safety outputs of Japanese LLMs.
Demonstrated the effectiveness of using the AC dataset for safety fine-tuning of a Japanese LLM, showing decreased harmful response rates without compromising on general response utility.
Implemented a comprehensive safety evaluation framework for 12 Japanese LLMs using the AC dataset as a benchmark, revealing significant variations in safety measures across different models.
The inclusion of regional and cultural contexts in developing safety datasets, highlighting the limitations of simply translating safety data from English to Japanese.
Future directions involve enhancing the dataset with multi-language-multi-culture annotations to aid in the adaptation of similar safety initiatives in other regions.

💡 Why This Paper Matters

This research presents a critical advancement in ensuring the safety and appropriateness of outputs from Japanese LLMs through the creation of a culturally contextualized dataset. The results indicate that tailored safety measures can significantly enhance the reliability of LLM outputs, making them safer for public interaction. As AI systems continue to evolve, such datasets are essential for fostering responsible AI usage and minimizing risks associated with harmful language generation.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly interesting to AI security researchers as it tackles a pressing issue within the domain of LLM safety—namely, the contextual risks posed by AI outputs in different cultural frameworks. It provides methodologies for evaluating and mitigating these risks, which is crucial for developing more secure AI systems that can be safely deployed across diverse user bases. Furthermore, the insights gained from the AC dataset and its benchmarks will inform future research on cross-cultural AI safety practices.

AnswerCarefully: A Dataset for Improving the Safety of Japanese LLM Output

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper