← Back to Library

ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation

Authors: Kai Ye, Liangcai Su, Chenxiong Qian

Published: 2025-09-09

arXiv ID: 2509.07941v1

Added to Library: 2025-09-10 04:00 UTC

Red Teaming

📄 Abstract

Code generation has emerged as a pivotal capability of Large Language Models(LLMs), revolutionizing development efficiency for programmers of all skill levels. However, the complexity of data structures and algorithmic logic often results in functional deficiencies and security vulnerabilities in generated code, reducing it to a prototype requiring extensive manual debugging. While Retrieval-Augmented Generation (RAG) can enhance correctness and security by leveraging external code manuals, it simultaneously introduces new attack surfaces. In this paper, we pioneer the exploration of attack surfaces in Retrieval-Augmented Code Generation (RACG), focusing on malicious dependency hijacking. We demonstrate how poisoned documentation containing hidden malicious dependencies (e.g., matplotlib_safe) can subvert RACG, exploiting dual trust chains: LLM reliance on RAG and developers' blind trust in LLM suggestions. To construct poisoned documents, we propose ImportSnare, a novel attack framework employing two synergistic strategies: 1)Position-aware beam search optimizes hidden ranking sequences to elevate poisoned documents in retrieval results, and 2)Multilingual inductive suggestions generate jailbreaking sequences to manipulate LLMs into recommending malicious dependencies. Through extensive experiments across Python, Rust, and JavaScript, ImportSnare achieves significant attack success rates (over 50% for popular libraries such as matplotlib and seaborn) in general, and is also able to succeed even when the poisoning ratio is as low as 0.01%, targeting both custom and real-world malicious packages. Our findings reveal critical supply chain risks in LLM-powered development, highlighting inadequate security alignment for code generation tasks. To support future research, we will release the multilingual benchmark suite and datasets. The project homepage is https://importsnare.github.io.

🔍 Key Points

  • Introduces ImportSnare, a novel attack framework that utilizes poisoned documentation via position-aware beam search and multilingual inductive suggestions to hijack dependencies in Retrieval-Augmented Code Generation (RACG).
  • Demonstrates the ability of ImportSnare to achieve attack success rates exceeding 50% against popular libraries even with a poisoning ratio as low as 0.01%, indicating significant vulnerabilities in LLM-based code generation systems.
  • Identifies dual trust chain vulnerabilities, revealing how LLMs' reliance on external RAG documents, and users' trust in LLM recommendations create substantial security risks in the software supply chain.
  • Quantifies the impact of attack techniques across multiple programming languages (Python, Rust, and JavaScript), underscoring the widespread implications for developers using LLMs for code generation.
  • Provides comprehensive experiments and datasets that support the reproducibility of the research and encourage future investigations into mitigation strategies.

💡 Why This Paper Matters

This paper is crucial as it uncovers significant security vulnerabilities in the rapidly evolving area of code generation using LLMs, emphasizing the risks posed by malicious code dependencies. Its contributions highlight the urgent need for improved security protocols in AI-driven software development tools to prevent exploitation of these vulnerabilities.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper highly relevant as it exposes critical attack vectors in LLM-integrated applications, particularly through the lens of dependency hijacking. The insights into the dual trust chain vulnerabilities inform the development of more robust security measures and frameworks, fostering safer and more trustworthy AI-driven systems in software engineering.

📚 Read the Full Paper