← Back to Library

When the Domain Expert Has No Time and the LLM Developer Has No Clinical Expertise: Real-World Lessons from LLM Co-Design in a Safety-Net Hospital

Authors: Avni Kothari, Patrick Vossler, Jean Digitale, Mohammad Forouzannia, Elise Rosenberg, Michele Lee, Jennee Bryant, Melanie Molina, James Marks, Lucas Zier, Jean Feng

Published: 2025-08-11

arXiv ID: 2508.08504v1

Added to Library: 2025-08-14 23:08 UTC

Safety

📄 Abstract

Large language models (LLMs) have the potential to address social and behavioral determinants of health by transforming labor intensive workflows in resource-constrained settings. Creating LLM-based applications that serve the needs of underserved communities requires a deep understanding of their local context, but it is often the case that neither LLMs nor their developers possess this local expertise, and the experts in these communities often face severe time/resource constraints. This creates a disconnect: how can one engage in meaningful co-design of an LLM-based application for an under-resourced community when the communication channel between the LLM developer and domain expert is constrained? We explored this question through a real-world case study, in which our data science team sought to partner with social workers at a safety net hospital to build an LLM application that summarizes patients' social needs. Whereas prior works focus on the challenge of prompt tuning, we found that the most critical challenge in this setting is the careful and precise specification of \what information to surface to providers so that the LLM application is accurate, comprehensive, and verifiable. Here we present a novel co-design framework for settings with limited access to domain experts, in which the summary generation task is first decomposed into individually-optimizable attributes and then each attribute is efficiently refined and validated through a multi-tier cascading approach.

🔍 Key Points

  • The paper presents a novel multi-tier co-design framework for developing LLM applications in resource-constrained healthcare settings, focusing on enhancing collaboration between LLM developers and domain experts who often lack availability or expertise.
  • It emphasizes decomposing complex summarization tasks into manageable, individually-optimizable attributes, facilitating the creation of structured outputs while reducing the burden on overtaxed experts.
  • Experimental results demonstrate a significant improvement in LLM extraction accuracy and concordance with human annotations, validating the effectiveness of the proposed framework in a real-world case study with social workers at a safety-net hospital.
  • The research identifies key obstacles in LLM application development, such as resource constraints, underspecified requirements, and lack of gold-standard summaries, highlighting the necessity for new strategies in low-resource environments.
  • The findings underscore the potential of LLMs to improve workflows in healthcare, specifically in pre-chart summarization by social workers, thereby accelerating processes critical to patient care.

💡 Why This Paper Matters

This paper is relevant as it addresses a critical gap in the design and deployment of AI applications in healthcare, particularly in underserved communities. By providing a practical framework for co-design that takes into account the constraints faced by both LLM developers and domain experts, it not only contributes to the academic discourse on human-AI collaboration but also has significant implications for improving the efficiency and effectiveness of healthcare services in resource-limited settings.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers may find this paper of interest as it discusses potential vulnerabilities associated with LLMs when deployed in sensitive environments such as healthcare. The framework presented emphasizes accuracy, validation, and performance measurement, which are critical aspects in ensuring that AI systems are implemented safely and effectively in real-world applications. Additionally, the methodologies for creating structured and verifiable outputs could inform approaches to enhancing the security and reliability of AI systems against biases and inaccuracies.

📚 Read the Full Paper