← Back to Library

Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs

Authors: Andrew MaranhΓ£o Ventura D'addario

Published: 2025-11-24

arXiv ID: 2511.21757v1

Added to Library: 2025-12-01 03:02 UTC

Safety

πŸ“„ Abstract

The integration of Large Language Models (LLMs) into healthcare demands a safety paradigm rooted in \textit{primum non nocere}. However, current alignment techniques rely on generic definitions of harm that fail to capture context-dependent violations, such as administrative fraud and clinical discrimination. To address this, we introduce Medical Malice: a dataset of 214,219 adversarial prompts calibrated to the regulatory and ethical complexities of the Brazilian Unified Health System (SUS). Crucially, the dataset includes the reasoning behind each violation, enabling models to internalize ethical boundaries rather than merely memorizing a fixed set of refusals. Using an unaligned agent (Grok-4) within a persona-driven pipeline, we synthesized high-fidelity threats across seven taxonomies, ranging from procurement manipulation and queue-jumping to obstetric violence. We discuss the ethical design of releasing these "vulnerability signatures" to correct the information asymmetry between malicious actors and AI developers. Ultimately, this work advocates for a shift from universal to context-aware safety, providing the necessary resources to immunize healthcare AI against the nuanced, systemic threats inherent to high-stakes medical environments -- vulnerabilities that represent the paramount risk to patient safety and the successful integration of AI in healthcare systems.

πŸ” Key Points

  • Introduction of the Medical Malice dataset, comprising 214,219 adversarial prompts specifically designed for the ethical complexities of the Brazilian Unified Health System (SUS).
  • Highlighting the inadequacy of generic safety datasets for training AI in healthcare, emphasizing the need for context-aware safety measures.
  • Utilization of an adversarial agent, Grok-4, for generating high-fidelity malicious prompts that capture context-specific ethical violations in healthcare.
  • Discussion of ethical considerations involved in releasing a dataset that includes malicious intents, proposing a framework to reduce information asymmetry between malicious actors and AI developers.
  • Advocacy for a paradigm shift from universal safety approaches to context-aware safety mechanisms in healthcare AI systems.

πŸ’‘ Why This Paper Matters

This paper is essential as it addresses the critical gap in AI alignment methodologies specifically tailored for healthcare settings, ensuring that AI systems are robust against nuanced, context-dependent threats that could compromise patient safety. The Medical Malice dataset offers a structured approach for training AI models, empowering them to recognize and understand complex ethical violations inherent to specialized domains like healthcare rather than relying on generic definitions of harm.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would find this paper highly relevant as it provides insights into mitigating risks associated with the deployment of AI systems in high-stakes environments, like healthcare. By focusing on context-specific adversarial prompts, the paper effectively illustrates how traditional safety measures fall short in critical areas. Moreover, the methodology used to generate the dataset and the ethical discourse surrounding its release contributes to the broader conversation about AI safety, making it a significant resource for refining security protocols against targeted misuses of AI.

πŸ“š Read the Full Paper