← Back to Library

From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)

Authors: Suyash Mishra, Qiang Li, Srikanth Patil, Anubhav Girdhar

Published: 2026-01-08

arXiv ID: 2601.05059v1

Added to Library: 2026-01-09 03:00 UTC

📄 Abstract

Vision Language Models (VLMs) are poised to revolutionize the digital transformation of pharmacyceutical industry by enabling intelligent, scalable, and automated multi-modality content processing. Traditional manual annotation of heterogeneous data modalities (text, images, video, audio, and web links), is prone to inconsistencies, quality degradation, and inefficiencies in content utilization. The sheer volume of long video and audio data further exacerbates these challenges, (e.g. long clinical trial interviews and educational seminars). Here, we introduce a domain adapted Video to Video Clip Generation framework that integrates Audio Language Models (ALMs) and Vision Language Models (VLMs) to produce highlight clips. Our contributions are threefold: (i) a reproducible Cut & Merge algorithm with fade in/out and timestamp normalization, ensuring smooth transitions and audio/visual alignment; (ii) a personalization mechanism based on role definition and prompt injection for tailored outputs (marketing, training, regulatory); (iii) a cost efficient e2e pipeline strategy balancing ALM/VLM enhanced processing. Evaluations on Video MME benchmark (900) and our proprietary dataset of 16,159 pharmacy videos across 14 disease areas demonstrate 3 to 4 times speedup, 4 times cost reduction, and competitive clip quality. Beyond efficiency gains, we also report our methods improved clip coherence scores (0.348) and informativeness scores (0.721) over state of the art VLM baselines (e.g., Gemini 2.5 Pro), highlighting the potential of transparent, custom extractive, and compliance supporting video summarization for life sciences.

🔍 Key Points

  • Introduction of the PC² framework, the first black-box political jailbreaking method specifically designed for text-to-image (T2I) models, allowing the generation of politically controversial content that bypasses safety filters.
  • Utilization of Identity-Preserving Descriptive Mapping (IPDM) and Geopolitically Distal Translation techniques to obfuscate sensitive keywords and leverage language disparities, achieving attack success rates of up to 86% across various T2I models.
  • Development of a benchmark dataset containing 240 politically sensitive prompts involving 36 public figures, providing a systematic basis for evaluating the efficacy of T2I safety filters against politically motivated adversarial prompting.
  • Comprehensive evaluation methodology that includes translation performance, model sensitivity, political jailbreaking attack performance, and a root-cause analysis, thereby elucidating the mechanisms behind the failures of existing safety measures.
  • Discussion of the broader implications of political jailbreaking techniques in the context of misinformation and the challenges faced by current safety mechanisms in commercial T2I models.

💡 Why This Paper Matters

This paper is crucial as it highlights a significant gap in the robustness of safety filters in state-of-the-art T2I models against adversarial political prompting, providing empirical evidence of vulnerabilities that could be exploited for disinformation. With the rise of generative AI and its societal implications, understanding these vulnerabilities can inform the development of more effective safety measures and context-aware moderation strategies.

🎯 Why It's Interesting for AI Security Researchers

The findings presented in this paper would significantly interest AI security researchers as they demonstrate the potential for misuse of AI technologies in generating politically sensitive content. The novel methods introduced challenge existing safety mechanisms, providing insights into their limitations and prompting further research on robust defenses against adversarial attacks. Additionally, the implications for misinformation in political contexts highlight the increasing importance of AI ethics and responsible deployment of generative models.

📚 Read the Full Paper