SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

📄 Abstract

Agentic systems increasingly rely on reusable procedural capabilities, \textit{a.k.a., agentic skills}, to execute long-horizon workflows reliably. These capabilities are callable modules that package procedural knowledge with explicit applicability conditions, execution policies, termination criteria, and reusable interfaces. Unlike one-off plans or atomic tool calls, skills operate (and often do well) across tasks. This paper maps the skill layer across the full lifecycle (discovery, practice, distillation, storage, composition, evaluation, and update) and introduces two complementary taxonomies. The first is a system-level set of \textbf{seven design patterns} capturing how skills are packaged and executed in practice, from metadata-driven progressive disclosure and executable code skills to self-evolving libraries and marketplace distribution. The second is an orthogonal \textbf{representation $\times$ scope} taxonomy describing what skills \emph{are} (natural language, code, policy, hybrid) and what environments they operate over (web, OS, software engineering, robotics). We analyze the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution, grounded by a case study of the ClawHavoc campaign in which nearly 1{,}200 malicious skills infiltrated a major agent marketplace, exfiltrating API keys, cryptocurrency wallets, and browser credentials at scale. We further survey deterministic evaluation approaches, anchored by recent benchmark evidence that curated skills can substantially improve agent success rates while self-generated skills may degrade them. We conclude with open challenges toward robust, verifiable, and certifiable skills for real-world autonomous agents.

🔍 Key Points

Introduces a unified definition of agentic skills, formalized as a four-tuple, distinguishing them from tools, plans, and episodic memories.
Proposes a comprehensive skill lifecycle model, covering stages from skill discovery to evaluation and update, highlighting the evolving nature of skills in agent systems.
Presents a taxonomy of seven design patterns for skill management, showcasing how skills are packaged, executed, and integrated in practical applications.
Analyzes security implications of agentic skills through case studies, particularly emphasizing supply-chain risks exemplified by the ClawHavoc campaign.
Develops an evaluation framework that demonstrates the advantages of curated skills over self-generated ones, indicating significant improvements in agent performance.

💡 Why This Paper Matters

This paper presents a significant advancement in the understanding and management of agentic skills within LLM agents, addressing both theoretical frameworks and practical implications. By systematizing the skill lifecycle and design patterns, it provides a structured approach for developing more efficient and reliable autonomous systems. Its exploration of security risks and governance related to skill-based agents is crucial for ensuring safe deployment in real-world applications.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper is particularly relevant as it not only identifies security threats associated with agentic skills, such as supply-chain risks and prompt injection vulnerabilities, but also discusses the governance protocols necessary to mitigate those risks. The detailed case study of the ClawHavoc campaign further emphasizes the real-world implications of insecure skill execution, making it a critical resource for understanding and addressing vulnerabilities in AI systems.

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper