← Back to Library

The Shawshank Redemption of Embodied AI: Understanding and Benchmarking Indirect Environmental Jailbreaks

Authors: Chunyang Li, Zifeng Kang, Junwei Zhang, Zhuo Ma, Anda Cheng, Xinghua Li, Jianfeng Ma

Published: 2025-11-20

arXiv ID: 2511.16347v1

Added to Library: 2025-11-21 03:00 UTC

Red Teaming

📄 Abstract

The adoption of Vision-Language Models (VLMs) in embodied AI agents, while being effective, brings safety concerns such as jailbreaking. Prior work have explored the possibility of directly jailbreaking the embodied agents through elaborated multi-modal prompts. However, no prior work has studied or even reported indirect jailbreaks in embodied AI, where a black-box attacker induces a jailbreak without issuing direct prompts to the embodied agent. In this paper, we propose, for the first time, indirect environmental jailbreak (IEJ), a novel attack to jailbreak embodied AI via indirect prompt injected into the environment, such as malicious instructions written on a wall. Our key insight is that embodied AI does not ''think twice'' about the instructions provided by the environment -- a blind trust that attackers can exploit to jailbreak the embodied agent. We further design and implement open-source prototypes of two fully-automated frameworks: SHAWSHANK, the first automatic attack generation framework for the proposed attack IEJ; and SHAWSHANK-FORGE, the first automatic benchmark generation framework for IEJ. Then, using SHAWSHANK-FORGE, we automatically construct SHAWSHANK-BENCH, the first benchmark for indirectly jailbreaking embodied agents. Together, our two frameworks and one benchmark answer the questions of what content can be used for malicious IEJ instructions, where they should be placed, and how IEJ can be systematically evaluated. Evaluation results show that SHAWSHANK outperforms eleven existing methods across 3,957 task-scene combinations and compromises all six tested VLMs. Furthermore, current defenses only partially mitigate our attack, and we have responsibly disclosed our findings to all affected VLM vendors.

🔍 Key Points

  • Introduction of Indirect Environmental Jailbreaks (IEJ), a novel black-box attack method that manipulates embodied AI systems by embedding malicious instructions in the environment rather than through direct interaction.
  • Development of the Shawshank framework, which consists of automated tools for generating malicious instructions and evaluating their effectiveness, addressing specific questions about attack strategies.
  • Creation of Shawshank-Forge, the first automatic benchmark generation framework for IEJ, facilitating systematic testing and evaluation of such attacks on various Vision-Language Models (VLMs).
  • Demonstration that the Shawshank framework significantly outperforms existing methods in inducing jailbreaks across six tested VLMs, proving more efficient and effective than prior direct methods.
  • Findings expose crucial gaps in the safety of current VLM defenses, highlighting that existing countermeasures only partially mitigate the IEJ threat.

💡 Why This Paper Matters

This paper presents groundbreaking work in the field of AI security, specifically addressing vulnerabilities in embodied AI systems through the introduction of IEJ attacks. As safety mechanisms in AI continue to lag behind technological advancements, the insights and frameworks provided in this research are crucial for understanding and bridging these gaps, ensuring that AI systems operate safely in real-world environments.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper particularly relevant as it not only highlights a previously unexplored attack vector in embodied AI but also provides practical frameworks for assessing and improving the security of AI systems. This research sheds light on the dual challenges of safety and effectiveness in embodied AI applications, making it imperative for ongoing studies in AI robustness and safety to consider the implications of indirect manipulation tactics.

📚 Read the Full Paper