← Back to Library

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

Authors: Xiaojun Jia, Jie Liao, Simeng Qin, Jindong Gu, Wenqi Ren, Xiaochun Cao, Yang Liu, Philip Torr

Published: 2026-02-15

arXiv ID: 2602.14211v1

Added to Library: 2026-02-17 04:02 UTC

Red Teaming

πŸ“„ Abstract

Agent skills are becoming a core abstraction in coding agents, packaging long-form instructions and auxiliary scripts to extend tool-augmented behaviors. This abstraction introduces an under-measured attack surface: skill-based prompt injection, where poisoned skills can steer agents away from user intent and safety policies. In practice, naive injections often fail because the malicious intent is too explicit or drifts too far from the original skill, leading agents to ignore or refuse them; existing attacks are also largely hand-crafted. We propose the first automated framework for stealthy prompt injection tailored to agent skills. The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills in a realistic tool environment, and an Evaluate Agent that logs action traces (e.g., tool calls and file operations) and verifies whether targeted malicious behaviors occurred. We also propose a malicious payload hiding strategy that conceals adversarial operations in auxiliary scripts while injecting optimized inducement prompts to trigger tool execution. Extensive experiments across diverse coding-agent settings and real-world software engineering tasks show that our method consistently achieves high attack success rates under realistic settings.

πŸ” Key Points

  • SkillJect introduces an automated framework for skill-based prompt injection in coding agents, marking a first in identifying and exploiting vulnerabilities within modular skill systems.
  • The framework operates through a closed-loop mechanism involving three agents: an Attack Agent, a Code Agent, and an Evaluate Agent, enhancing the stealthiness and effectiveness of the injection process.
  • A novel malicious payload hiding strategy is proposed, where harmful instructions are concealed within benign-looking auxiliary scripts, significantly increasing the likelihood of successful execution without detection.
  • The systematic evaluation includes extensive experiments demonstrating high attack success rates (upwards of 95%) across various real-world coding tasks and multiple language model backends, highlighting the method's robustness and generalizability.
  • Findings suggest a critical gap in current coding agent security measures, emphasizing the need for more comprehensive defenses against this newly characterized class of prompt injection attacks.

πŸ’‘ Why This Paper Matters

The paper is crucial in the context of cybersecurity for AI systems as it not only identifies a new attack vector that exploits the modular nature of coding agent skills but also provides a methodological framework to systematically analyze these vulnerabilities. The high success rates observed across various scenarios indicate a pressing need for enhanced security measures to protect coding agents from stealthy manipulations. Additionally, the contributions of SkillJect offer foundational insights for developing more resilient AI systems.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant to AI security researchers because it exposes a significant and underexplored attack surface related to coding agentsβ€”an increasingly common feature in AI applications. The innovative techniques for automating prompt injection and the evidential effectiveness of these attacks reveal vulnerabilities that could have wide implications for the deployment of AI systems in real-world applications. Understanding these risks is vital for developing effective countermeasures and maintaining trust in AI technology.

πŸ“š Read the Full Paper