← Back to Library

NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge

Authors: Hanyu Zhu, Lance Fiondella, Jiawei Yuan, Kai Zeng, Long Jiao

Published: 2025-10-24

arXiv ID: 2510.21144v1

Added to Library: 2025-11-14 23:07 UTC

Red Teaming

📄 Abstract

Retrieval-Augmented Generation (RAG) empowers Large Language Models (LLMs) to dynamically integrate external knowledge during inference, improving their factual accuracy and adaptability. However, adversaries can inject poisoned external knowledge to override the model's internal memory. While existing attacks iteratively manipulate retrieval content or prompt structure of RAG, they largely ignore the model's internal representation dynamics and neuron-level sensitivities. The underlying mechanism of RAG poisoning has not been fully studied and the effect of knowledge conflict with strong parametric knowledge in RAG is not considered. In this work, we propose NeuroGenPoisoning, a novel attack framework that generates adversarial external knowledge in RAG guided by LLM internal neuron attribution and genetic optimization. Our method first identifies a set of Poison-Responsive Neurons whose activation strongly correlates with contextual poisoning knowledge. We then employ a genetic algorithm to evolve adversarial passages that maximally activate these neurons. Crucially, our framework enables massive-scale generation of effective poisoned RAG knowledge by identifying and reusing promising but initially unsuccessful external knowledge variants via observed attribution signals. At the same time, Poison-Responsive Neurons guided poisoning can effectively resolves knowledge conflict. Experimental results across models and datasets demonstrate consistently achieving high Population Overwrite Success Rate (POSR) of over 90% while preserving fluency. Empirical evidence shows that our method effectively resolves knowledge conflict.

🔍 Key Points

  • Introduction of NeuroGenPoisoning, a novel framework that utilizes neuron-guided attacks to inject poisoned external knowledge into Retrieval-Augmented Generation (RAG) systems.
  • Identification of Poison-Responsive Neurons using Integrated Gradients, allowing targeted optimization of adversarial contexts to maximize their influence on model outputs.
  • Implementation of genetic algorithms to evolve adversarial passages, significantly increasing the effectiveness of the poisoning attacks, with a Population Overwrite Success Rate (POSR) exceeding 90% across multiple datasets and models.
  • Demonstration of the method's ability to resolve knowledge conflicts effectively, thereby overriding strong internal memory in LLMs that would typically resist manipulation.
  • Validation of the framework across various models and open-domain datasets, highlighting robustness and generalizability in different knowledge domains.

💡 Why This Paper Matters

The paper presents a significant advancement in the understanding of vulnerabilities in Retrieval-Augmented Generation (RAG) systems, particularly how adversaries can leverage internal neuron dynamics to compromise model outputs. By addressing knowledge conflict explicitly, NeuroGenPoisoning not only increases the success rates of poisoning attacks but also raises important questions about the security and robustness of large language models in real-world applications. This work is a critical step toward developing defenses against such manipulations, underlining the necessity for ongoing research in AI safety.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant to AI security researchers because it delves into novel attack vectors that exploit inherent vulnerabilities in LLMs, specifically through the lens of neuron-level dynamics. As LLMs become increasingly embedded in applications requiring factual accuracy and reliability, understanding how adversaries can undermine these models is crucial. The methodical approach taken in NeuroGenPoisoning not only provides insight into potential exploitation techniques but also emphasizes the need for robust defenses, making this research essential for ongoing developments in AI security protocols.

📚 Read the Full Paper