CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

Authors: Hanna Foerster, Tom Blanchard, Kristina Nikolić, Ilia Shumailov, Cheng Zhang, Robert Mullins, Nicolas Papernot, Florian Tramèr, Yiren Zhao

Published: 2026-01-14

arXiv ID: 2601.09923v2

Added to Library: 2026-03-10 03:01 UTC

📄 Abstract

AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss. The only known robust defense is architectural isolation that strictly separates trusted task planning from untrusted environment observations. However, applying this design to Computer Use Agents (CUAs) -- systems that automate tasks by viewing screens and executing actions -- presents a fundamental challenge: current agents require continuous observation of UI state to determine each action, conflicting with the isolation required for security. We resolve this tension by demonstrating that UI workflows, while dynamic, are structurally predictable. We introduce Single-Shot Planning for CUAs, where a trusted planner generates a complete execution graph with conditional branches before any observation of potentially malicious content, providing provable control flow integrity guarantees against arbitrary instruction injections. Although this architectural isolation successfully prevents instruction injections, we show that additional measures are needed to prevent Branch Steering attacks, which manipulate UI elements to trigger unintended valid paths within the plan. We evaluate our design on OSWorld, and retain up to 57% of the performance of frontier models while improving performance for smaller open-source models by up to 19%, demonstrating that rigorous security and utility can coexist in CUAs.

🔍 Key Points

Identification of a video-specific vulnerability in text-to-video (T2V) models, which can infill harmful intermediate frames from sparse boundary conditions in fragmented prompts.
Development of a two-step framework, titled TFM (Two Frame Matter), which uses Temporal Boundary Prompting (TBP) and Covert Substitution Mechanism (CSM) to enhance the effectiveness of jailbreak attacks on T2V models.
Empirical validation across multiple open-source and commercial T2V models demonstrating that TFM achieves up to a 12% increase in the attack success rate compared to existing methods.
The introduction of a novel threat model for T2V systems, enabling a stringent black-box context for evaluating prompt vulnerabilities against safety filters.
Necessary implications for the design of safer and more robust T2V models, highlighting the importance of temporally aware safety mechanisms.

💡 Why This Paper Matters

This paper presents significant advancements in understanding the vulnerabilities of text-to-video models, specifically how they can be manipulated through temporal prompt engineering. By demonstrating the effectiveness of the TFM framework in executing successful jailbreak attacks, the authors emphasize the urgent need for improved safety mechanisms, particularly as T2V technology becomes more prevalent and potent in real-world applications. The findings underscore the imperative for ongoing research into AI safety, ensuring responsible deployment of advanced generative systems.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of interest to AI security researchers as it unveils sophisticated attack vectors specific to the rapidly evolving text-to-video generative systems. It emphasizes the interplay between model structure and prompt engineering, serving as a reference for developing countermeasures against existing vulnerabilities. Furthermore, it provides empirical evidence demonstrating the effectiveness of novel attack methodologies, critical for anticipating and mitigating potential risks associated with AI-generated content.

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper