← Back to Library

Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance

Authors: Fazhong Liu, Zhuoyan Chen, Tu Lan, Haozhen Tan, Zhenyu Xu, Xiang Li, Guoxing Chen, Yan Meng, Haojin Zhu

Published: 2026-03-20

arXiv ID: 2603.19974v1

Added to Library: 2026-03-23 02:01 UTC

Red Teaming

📄 Abstract

Autonomous coding agents are increasingly integrated into software development workflows, offering capabilities that extend beyond code suggestion to active system interaction and environment management. OpenClaw, a representative platform in this emerging paradigm, introduces an extensible skill ecosystem that allows third-party developers to inject behavioral guidance through lifecycle hooks during agent initialization. While this design enhances automation and customization, it also opens a novel and unexplored attack surface. In this paper, we identify and systematically characterize guidance injection, a stealthy attack vector that embeds adversarial operational narratives into bootstrap guidance files. Unlike traditional prompt injection, which relies on explicit malicious instructions, guidance injection manipulates the agent's reasoning context by framing harmful actions as routine best practices. These narratives are automatically incorporated into the agent's interpretive framework and influence future task execution without raising suspicion.We construct 26 malicious skills spanning 13 attack categories including credential exfiltration, workspace destruction, privilege escalation, and persistent backdoor installation. We evaluate them using ORE-Bench, a realistic developer workspace benchmark we developed. Across 52 natural user prompts and six state-of-the-art LLM backends, our attacks achieve success rates from 16.0% to 64.2%, with the majority of malicious actions executed autonomously without user confirmation. Furthermore, 94% of our malicious skills evade detection by existing static and LLM-based scanners. Our findings reveal fundamental tensions in the design of autonomous agent ecosystems and underscore the urgent need for defenses based on capability isolation, runtime policy enforcement, and transparent guidance provenance.

🔍 Key Points

  • Introduction of 'Guidance Injection' as a new class of stealthy attacks targeting autonomous coding agents via malicious skill injections.
  • Development of 26 malicious skills across 13 attack categories, demonstrating the feasibility of these attacks and their potential impact on software development environments.
  • Implementation of ORE-Bench, a benchmark designed to evaluate the security risks associated with autonomous coding agents and the effectiveness of various attacks.
  • Experimental results showing high success rates (16.0% to 64.2%) for guidance injection attacks across multiple state-of-the-art LLM backends, with significant evasion of existing detection mechanisms.
  • Recommendations for structural defenses based on capability isolation, runtime policy enforcement, and enhanced monitoring protocols to mitigate the risks introduced by guidance injection.

💡 Why This Paper Matters

This paper is crucial as it highlights a novel attack vector within autonomous coding ecosystems, emphasizing the risks posed by guidance injection methods that could lead to serious consequences, like data breaches or unauthorized modifications to systems. The findings underscore the importance of redesigning security measures in AI agent architectures to ensure safer integration and functionality within software development workflows.

🎯 Why It's Interesting for AI Security Researchers

The paper's findings are significant for AI security researchers as it uncovers new attack methodologies that exploit the inherent design of autonomous coding agents. By illustrating how malicious narratives can manipulate agent reasoning and behavior without direct code execution, it provokes critical discussions around the limits of existing security paradigms and the need for innovative defenses against emerging threats. This research may drive future studies focusing on enhancing the security of LLM-integrated systems and developing robust mitigation strategies.

📚 Read the Full Paper