← Back to Library

Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

Authors: Weidi Luo, Xiaofei Wen, Tenghao Huang, Hongyi Wang, Zhen Xiang, Chaowei Xiao, Kristina Gligorić, Muhao Chen

Published: 2026-04-01

arXiv ID: 2604.01444v2

Added to Library: 2026-04-06 02:06 UTC

Red Teaming

πŸ“„ Abstract

Large language models (LLMs) are increasingly deployed for everyday tasks, including food preparation and health-related guidance. However, food safety remains a high-stakes domain where inaccurate or misleading information can cause severe real-world harm. Despite these risks, current LLMs and safety guardrails lack rigorous alignment tailored to domain-specific food hazards. To address this gap, we introduce FoodGuardBench, the first comprehensive benchmark comprising 3,339 queries grounded in FDA guidelines, designed to evaluate the safety and robustness of LLMs. By constructing a taxonomy of food safety principles and employing representative jailbreak attacks (e.g., AutoDAN and PAP), we systematically evaluate existing LLMs and guardrails. Our evaluation results reveal three critical vulnerabilities: First, current LLMs exhibit sparse safety alignment in the food-related domain, easily succumbing to a few canonical jailbreak strategies. Second, when compromised, LLMs frequently generate actionable yet harmful instructions, inadvertently empowering malicious actors and posing tangible risks. Third, existing LLM-based guardrails systematically overlook these domain-specific threats, failing to detect a substantial volume of malicious inputs. To mitigate these vulnerabilities, we introduce FoodGuard-4B, a specialized guardrail model fine-tuned on our datasets to safeguard LLMs within food-related domains.

πŸ” Key Points

  • Introduction of FoodGuardBench, the first comprehensive benchmark for evaluating the safety of LLMs in the food safety domain, containing 3,339 queries based on FDA guidelines.
  • Identification of critical vulnerabilities in existing LLMs, including inadequate safety alignment, susceptibility to common jailbreak attacks, and generation of harmful instructions.
  • Development of FoodGuard-4B, a specialized guardrail model that significantly enhances threat detection against malicious input in food-related contexts, achieving high accuracy and low false positive rates.
  • Empirical evaluations showing that current LLM guardrails often fail to detect significant risks specific to food safety, necessitating targeted safety solutions.
  • Comprehensive methodology for evaluating food safety risks in LLMs, including a robust adversarial query generation pipeline.

πŸ’‘ Why This Paper Matters

This paper presents significant advancements in the evaluation and enhancement of food safety protocols within large language models. By creating tools like FoodGuardBench and FoodGuard-4B, it addresses a critical gap in the safety alignment of LLMs used in high-stakes domains such as food preparation and health guidance. The systematic evaluation of vulnerabilities and the proposed solutions provide a foundational framework for improving LLM deployment in sensitive applications, thus enhancing user safety and trust.

🎯 Why It's Interesting for AI Security Researchers

The paper is highly relevant to AI security researchers as it tackles the pressing issue of safety and robustness in AI systems operating in critical areas like food safety. By exposing vulnerabilities and proposing a novel benchmark and guardrail model, it sets the stage for future research focusing on aligning AI systems with specific safety requirements. This work not only highlights the risks associated with LLM deployment but also offers practical tools to mitigate such risks, paving the way for safer AI applications.

πŸ“š Read the Full Paper