CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

Authors: Hanna Foerster, Robert Mullins, Tom Blanchard, Nicolas Papernot, Kristina Nikolić, Florian Tramèr, Ilia Shumailov, Cheng Zhang, Yiren Zhao

Published: 2026-01-14

arXiv ID: 2601.09923v1

Added to Library: 2026-01-16 03:03 UTC

📄 Abstract

AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss. The only known robust defense is architectural isolation that strictly separates trusted task planning from untrusted environment observations. However, applying this design to Computer Use Agents (CUAs) -- systems that automate tasks by viewing screens and executing actions -- presents a fundamental challenge: current agents require continuous observation of UI state to determine each action, conflicting with the isolation required for security. We resolve this tension by demonstrating that UI workflows, while dynamic, are structurally predictable. We introduce Single-Shot Planning for CUAs, where a trusted planner generates a complete execution graph with conditional branches before any observation of potentially malicious content, providing provable control flow integrity guarantees against arbitrary instruction injections. Although this architectural isolation successfully prevents instruction injections, we show that additional measures are needed to prevent Branch Steering attacks, which manipulate UI elements to trigger unintended valid paths within the plan. We evaluate our design on OSWorld, and retain up to 57% of the performance of frontier models while improving performance for smaller open-source models by up to 19%, demonstrating that rigorous security and utility can coexist in CUAs.

🔍 Key Points

Identification of the safety-utility dilemma in fine-tuning LLMs, where prioritizing one often compromises the other.
Discovery that safety gradients reside in a low-rank subspace, while utility gradients span a higher-dimensional space, leading to directional conflicts during fine-tuning.
Introduction of Safety-Preserving Fine-tuning (SPF), which efficiently decouples utility updates from safety degradation by employing a projection mechanism.
Demonstration through experiments that SPF maintains high task performance while nearly recovering pre-trained safety alignment, even under adversarial conditions.
Establishment of a theoretical framework that guarantees utility convergence while providing bounds on safety drift.

💡 Why This Paper Matters

This paper is crucial as it addresses the urgent need to enhance safety alignment in fine-tuned large language models (LLMs), which have become susceptible to various vulnerabilities during the fine-tuning process. By presenting the SPF method and showcasing its effectiveness, the research provides a practical solution for improving both safety and utility in real-world applications of LLMs. This approach offers a pathway to maintain aligned LLM performance in high-stakes environments where safety is paramount.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would find this paper of great interest due to its focus on mitigating safety risks associated with LLM fine-tuning. The identified vulnerabilities and the proposed SPF method offer significant advancements in securing LLMs against adversarial attacks, thereby contributing to the development of more reliable AI systems. The intersection of safety alignment and model performance is particularly relevant for those working on AI accountability and ethical AI deployment.

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper