← Back to Library

SafeMobile: Chain-level Jailbreak Detection and Automated Evaluation for Multimodal Mobile Agents

Authors: Siyuan Liang, Tianmeng Fang, Zhe Liu, Aishan Liu, Yan Xiao, Jinyuan He, Ee-Chien Chang, Xiaochun Cao

Published: 2025-07-01

arXiv ID: 2507.00841v1

Added to Library: 2025-07-03 04:00 UTC

Red Teaming

📄 Abstract

With the wide application of multimodal foundation models in intelligent agent systems, scenarios such as mobile device control, intelligent assistant interaction, and multimodal task execution are gradually relying on such large model-driven agents. However, the related systems are also increasingly exposed to potential jailbreak risks. Attackers may induce the agents to bypass the original behavioral constraints through specific inputs, and then trigger certain risky and sensitive operations, such as modifying settings, executing unauthorized commands, or impersonating user identities, which brings new challenges to system security. Existing security measures for intelligent agents still have limitations when facing complex interactions, especially in detecting potentially risky behaviors across multiple rounds of conversations or sequences of tasks. In addition, an efficient and consistent automated methodology to assist in assessing and determining the impact of such risks is currently lacking. This work explores the security issues surrounding mobile multimodal agents, attempts to construct a risk discrimination mechanism by incorporating behavioral sequence information, and designs an automated assisted assessment scheme based on a large language model. Through preliminary validation in several representative high-risk tasks, the results show that the method can improve the recognition of risky behaviors to some extent and assist in reducing the probability of agents being jailbroken. We hope that this study can provide some valuable references for the security risk modeling and protection of multimodal intelligent agent systems.

🔍 Key Points

  • Introduces SafeMobile, a novel framework for detecting and preventing jailbreak attacks in multimodal mobile agents by integrating behavior-level defense mechanisms with automated evaluation tools.
  • Develops SafeTrajGuard, a risk-aware module that employs trajectory optimization and contextual understanding to evaluate actions, preventing risky behaviors without compromising task functionality.
  • Implements GPTJudge, an automated evaluation system that assigns risk scores to agent behaviors based on multimodal inputs, effectively replacing manual security assessments.
  • Demonstrates significant improvements in defense effectiveness, with test results showing a 78.4% reduction in jailbreak success rates across various high-risk tasks and maintaining high task completion rates.
  • Presents a comprehensive approach combining threat modeling, risk discrimination, and automated evaluation, filling critical gaps in mobile agent security research.

💡 Why This Paper Matters

The study presents a critical advancement in the security of multimodal intelligent agents, addressing the growing threat of jailbreak attacks which manipulate these systems into performing unauthorized actions. The proposed SafeMobile framework not only enhances operational security but also ensures task completion, offering a scalable solution to a rapidly evolving issue in AI applications.

🎯 Why It's Interesting for AI Security Researchers

This paper is immensely relevant to AI security researchers as it tackles a pressing concern in the deployment of intelligent agents and the security implications of multimodal interactions. The novel methodologies employed for risk identification, behavior assessment, and automated evaluation contribute significantly to the ongoing discourse on securing AI systems, providing a foundation for future research and practical implementations in the field.

📚 Read the Full Paper