← Back to Library

Preventing Robotic Jailbreaking via Multimodal Domain Adaptation

Authors: Francesco Marchiori, Rohan Sinha, Christopher Agia, Alexander Robey, George J. Pappas, Mauro Conti, Marco Pavone

Published: 2025-09-27

arXiv ID: 2509.23281v1

Added to Library: 2025-09-30 04:04 UTC

Red Teaming

📄 Abstract

Large Language Models (LLMs) and Vision-Language Models (VLMs) are increasingly deployed in robotic environments but remain vulnerable to jailbreaking attacks that bypass safety mechanisms and drive unsafe or physically harmful behaviors in the real world. Data-driven defenses such as jailbreak classifiers show promise, yet they struggle to generalize in domains where specialized datasets are scarce, limiting their effectiveness in robotics and other safety-critical contexts. To address this gap, we introduce J-DAPT, a lightweight framework for multimodal jailbreak detection through attention-based fusion and domain adaptation. J-DAPT integrates textual and visual embeddings to capture both semantic intent and environmental grounding, while aligning general-purpose jailbreak datasets with domain-specific reference data. Evaluations across autonomous driving, maritime robotics, and quadruped navigation show that J-DAPT boosts detection accuracy to nearly 100% with minimal overhead. These results demonstrate that J-DAPT provides a practical defense for securing VLMs in robotic applications. Additional materials are made available at: https://j-dapt.github.io.

🔍 Key Points

  • Introduction of J-DAPT, a framework for multimodal jailbreak detection in robotics that integrates attention-based fusion of visual and textual embeddings.
  • Utilizes domain adaptation techniques to effectively bridge the gap between general-purpose jailbreak datasets and specialized robotics scenarios, achieving nearly 100% accuracy in detection with minimal overhead.
  • Experimental evaluations across various robotic contexts such as autonomous driving and maritime robotics demonstrate J-DAPT's robustness against novel jailbreak attacks, outperforming existing baselines significantly.
  • The methodology includes a cross-attention fusion mechanism that strengthens the detection by leveraging enhanced semantic alignment between text and visuals, improving prediction accuracy for unseen attacks.
  • J-DAPT achieves high detection rates without needing domain-specific jailbreak examples, addressing the scarcity of such data in robotics.

💡 Why This Paper Matters

The paper presents an innovative and practical approach to enhancing the safety of robotics systems using large language and vision-language models through J-DAPT. By addressing the critical issue of jailbreak vulnerabilities and demonstrating high efficacy across diverse real-world robotic environments, this research contributes significantly to the field of AI security and safe autonomous systems. Its practical implications can lead to safer deployment of AI models in various applications, ensuring reliable and responsible use of advanced robotic technologies.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant to AI security researchers as it tackles the pressing challenge of jailbreak attacks, which can lead to harmful outcomes in real-world applications of AI. The introduction of a novel framework (J-DAPT) that effectively utilizes multimodal data and domain adaptation is a crucial advancement in developing robust defenses against adversarial inputs. Researching vulnerabilities and their defenses enhances the understanding of AI safety and promotes the development of more secure AI systems, making this paper of significant interest for further studies in AI security.

📚 Read the Full Paper