โ† Back to Library

When Skills Lie: Hidden-Comment Injection in LLM Agents

Authors: Qianli Wang, Boyang Ma, Minghui Xu, Yue Zhang

Published: 2026-02-11

arXiv ID: 2602.10498v1

Added to Library: 2026-02-12 03:00 UTC

Red Teaming

๐Ÿ“„ Abstract

LLM agents often rely on Skills to describe available tools and recommended procedures. We study a hidden-comment prompt injection risk in this documentation layer: when a Markdown Skill is rendered to HTML, HTML comment blocks can become invisible to human reviewers, yet the raw text may still be supplied verbatim to the model. In experiments, we find that DeepSeek-V3.2 and GLM-4.5-Air can be influenced by malicious instructions embedded in a hidden comment appended to an otherwise legitimate Skill, yielding outputs that contain sensitive tool intentions. A short defensive system prompt that treats Skills as untrusted and forbids sensitive actions prevents these malicious tool calls and instead surfaces the suspicious hidden instructions.

๐Ÿ” Key Points

  • Identification of hidden-comment prompt injection as a vulnerability in LLM Skills, enabling attackers to embed malicious instructions that could alter the model's tool-call intentions.
  • Demonstration through experiments shows that common LLMs are susceptible to these hidden-comment injections, which can lead to severe security implications.
  • Proposed defensive mechanisms, including a prompt-level guardrail that treats Skills as untrusted and forces the model to identify suspicious instructions, effectively prevent these attacks while maintaining legitimate functionalities.
  • Design implications highlight the need for clearer separation between user-visible documentation and model-consumed content to reduce the risk of user misinterpretation and improve overall system security.
  • Emphasis on the human factors in LLM interactions, showcasing how hidden instruction injections exploit the gap between user perceptions and model behaviors.

๐Ÿ’ก Why This Paper Matters

This paper is significant as it sheds light on a previously unaddressed vulnerability in LLM agents related to the use of Skills, demonstrating how hidden comments can be a vector for malicious prompt injection. It establishes a foundation for improving security protocols and enhancing user trust in the use of LLMs in sensitive environments. The proposed defensive strategies have important implications for system design to mitigate exploitation risks without compromising usability.

๐ŸŽฏ Why It's Interesting for AI Security Researchers

AI security researchers would find this paper relevant as it addresses a critical aspect of security concerning large language modelsโ€”specifically, the manipulation of LLM behavior through hidden comment injections. It uncovers a nuanced attack vector that could be exploited in various applications, from software development tools to conversational agents. The findings prompt further investigation into securing LLMs against similar threats, informing best practices for model development and deployment.

๐Ÿ“š Read the Full Paper