Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

Authors: Weidi Luo, Xiaofei Wen, Tenghao Huang, Hongyi Wang, Zhen Xiang, Chaowei Xiao, Kristina Gligorić, Muhao Chen

Published: 2026-04-01

arXiv ID: 2604.01444v1

Added to Library: 2026-04-03 02:02 UTC

Red Teaming

📄 Abstract

Large language models (LLMs) are increasingly deployed for everyday tasks, including food preparation and health-related guidance. However, food safety remains a high-stakes domain where inaccurate or misleading information can cause severe real-world harm. Despite these risks, current LLMs and safety guardrails lack rigorous alignment tailored to domain-specific food hazards. To address this gap, we introduce FoodGuardBench, the first comprehensive benchmark comprising 3,339 queries grounded in FDA guidelines, designed to evaluate the safety and robustness of LLMs. By constructing a taxonomy of food safety principles and employing representative jailbreak attacks (e.g., AutoDAN and PAP), we systematically evaluate existing LLMs and guardrails. Our evaluation results reveal three critical vulnerabilities: First, current LLMs exhibit sparse safety alignment in the food-related domain, easily succumbing to a few canonical jailbreak strategies. Second, when compromised, LLMs frequently generate actionable yet harmful instructions, inadvertently empowering malicious actors and posing tangible risks. Third, existing LLM-based guardrails systematically overlook these domain-specific threats, failing to detect a substantial volume of malicious inputs. To mitigate these vulnerabilities, we introduce FoodGuard-4B, a specialized guardrail model fine-tuned on our datasets to safeguard LLMs within food-related domains.

🔍 Key Points

Introduction of FoodGuardBench, a comprehensive benchmark for evaluating LLM safety in food-related contexts, comprising 3,339 queries grounded in FDA guidelines.
Identification of critical vulnerabilities in existing LLMs: sparse safety alignment, generation of harmful guidance when compromised, and failures in current guardrails to detect malicious inputs.
Development of FoodGuard-4B, a specialized guardrail model fine-tuned to enhance protection for LLMs in food-related applications.
Demonstration of the effectiveness of jailbreak attacks (AutoDAN and PAP) in exploiting LLM vulnerabilities, revealing the inadequacy of current safety mechanisms in the food domain.
Evidence that existing LLMs pose real-world risks by generating unsafe food-related guidance, thereby necessitating rigorous safety measures.

💡 Why This Paper Matters

This paper is relevant as it addresses an urgent need to enhance the safety and reliability of large language models in high-stakes domains such as food safety. By developing a dedicated benchmark and guardrail, it provides the foundation for future research and practical applications that aim to reduce risks associated with AI-generated food-related advice, which can have serious implications for public health.

🎯 Why It's Interesting for AI Security Researchers

This paper is of significant interest to AI security researchers as it highlights the vulnerabilities of large language models in a specific, high-stakes application area—food safety. The findings provide insight into the challenges of maintaining safety in AI systems and emphasize the need for targeted approaches to security that consider domain-specific risks. Additionally, the introduction of systematic methods for evaluating and improving LLM safety can guide future research and development efforts in AI alignment and adversarial robustness.

Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper