Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

Authors: David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko

Published: 2025-10-30

arXiv ID: 2510.26328v1

Added to Library: 2025-11-14 23:06 UTC

Red Teaming

📄 Abstract

Enabling continual learning in LLMs remains a key unresolved research challenge. In a recent announcement, a frontier LLM company made a step towards this by introducing Agent Skills, a framework that equips agents with new knowledge based on instructions stored in simple markdown files. Although Agent Skills can be a very useful tool, we show that they are fundamentally insecure, since they enable trivially simple prompt injections. We demonstrate how to hide malicious instructions in long Agent Skill files and referenced scripts to exfiltrate sensitive data, such as internal files or passwords. Importantly, we show how to bypass system-level guardrails of a popular coding agent: a benign, task-specific approval with the "Don't ask again" option can carry over to closely related but harmful actions. Overall, we conclude that despite ongoing research efforts and scaling model capabilities, frontier LLMs remain vulnerable to very simple prompt injections in realistic scenarios. Our code is available at https://github.com/aisa-group/promptinject-agent-skills.

🔍 Key Points

Agent Skills are a newly introduced framework that allows agents to dynamically utilize knowledge based on markdown files, which presents a risk for prompt injections.
The authors demonstrate how malicious instructions can be hidden within Agent Skills to exfiltrate sensitive data, indicating a significant security vulnerability in such frameworks.
A key finding is the ability to bypass system-level guardrails by exploiting benign actions, which can be dangerous if users select options that allow actions without further prompts.
Experiments revealed that malicious scripts can be executed without user confirmation if the 'Don't ask again' feature is enabled, showcasing an exploitation pathway for attackers.
The paper emphasizes the importance of more robust defenses and alerts users against third-party Agent Skills that are not vetted for security.

💡 Why This Paper Matters

This paper is relevant as it exposes significant security vulnerabilities in the Agent Skills framework for LLMs, a critical aspect of ongoing developments in AI. By highlighting the ease with which malicious actions can be implemented and the potential consequences of such vulnerabilities, the paper serves as a call for improved security measures and oversight in AI applications that utilize similar architectures.

🎯 Why It's Interesting for AI Security Researchers

The paper would be of interest to AI security researchers as it uncovers a novel attack vector related to prompt injections, particularly in the context of continually learning models. The findings prompt further investigation into the security implications of dynamic knowledge integration in LLMs and underline the necessity for improved safeguarding mechanisms against even simple injections, which can have far-reaching impacts in practice.

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper