← Back to Library

RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework

Authors: Seif Ikbarieh, Kshitiz Aryal, Maanak Gupta

Published: 2025-11-09

arXiv ID: 2511.06212v1

Added to Library: 2025-11-14 23:03 UTC

Red Teaming

📄 Abstract

The rapid expansion of the Internet of Things (IoT) is reshaping communication and operational practices across industries, but it also broadens the attack surface and increases susceptibility to security breaches. Artificial Intelligence has become a valuable solution in securing IoT networks, with Large Language Models (LLMs) enabling automated attack behavior analysis and mitigation suggestion in Network Intrusion Detection Systems (NIDS). Despite advancements, the use of LLMs in such systems further expands the attack surface, putting entire networks at risk by introducing vulnerabilities such as prompt injection and data poisoning. In this work, we attack an LLM-based IoT attack analysis and mitigation framework to test its adversarial robustness. We construct an attack description dataset and use it in a targeted data poisoning attack that applies word-level, meaning-preserving perturbations to corrupt the Retrieval-Augmented Generation (RAG) knowledge base of the framework. We then compare pre-attack and post-attack mitigation responses from the target model, ChatGPT-5 Thinking, to measure the impact of the attack on model performance, using an established evaluation rubric designed for human experts and judge LLMs. Our results show that small perturbations degrade LLM performance by weakening the linkage between observed network traffic features and attack behavior, and by reducing the specificity and practicality of recommended mitigations for resource-constrained devices.

🔍 Key Points

  • Developed an IoT attack description dataset for 18 different attack descriptions leveraging prompt engineering techniques.
  • Conducted a targeted data poisoning attack on an LLM-based threat detection framework using word-level perturbations to assess its adversarial robustness.
  • Utilized a transfer-learning approach, fine-tuning a BERT model as a surrogate target, to generate adversarial examples for corrupting the Retrieval-Augmented Generation (RAG) knowledge base.
  • Demonstrated degradation of the performance of the LLM (ChatGPT-5 Thinking) in both attack behavior analysis and mitigation suggestions after adversarial attacks.
  • Provided a comprehensive evaluation through human expert assessment and judge LLMs to quantify the impact of adversarial attacks on model outputs.

💡 Why This Paper Matters

This paper is significant as it highlights the vulnerabilities of LLM-based systems in the domain of IoT security. By showcasing how targeted adversarial attacks can degrade the performance of LLMs in important cybersecurity tasks, the research stresses the necessity for robust architectures in security applications involving AI, especially in the context of rapidly expanding IoT environments. This work lays a foundation for developing more resilient frameworks to safeguard against similar security threats.

🎯 Why It's Interesting for AI Security Researchers

Research on the security of AI systems, particularly in the context of adversarial attacks, is crucial for ensuring the reliability of AI applications in sensitive domains such as cybersecurity. This paper's detailed analysis of adversarial attacks on LLMs provides valuable insights into potential vulnerabilities in threat detection systems that utilize these advanced models, making it highly relevant for AI security researchers and practitioners looking to enhance the robustness and efficacy of AI-driven cybersecurity solutions.

📚 Read the Full Paper