← Back to Library

TopicAttack: An Indirect Prompt Injection Attack via Topic Transition

Authors: Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, Bryan Hooi

Published: 2025-07-18

arXiv ID: 2507.13686v2

Added to Library: 2025-11-11 14:08 UTC

Red Teaming

📄 Abstract

Large language models (LLMs) have shown remarkable performance across a range of NLP tasks. However, their strong instruction-following capabilities and inability to distinguish instructions from data content make them vulnerable to indirect prompt injection attacks. In such attacks, instructions with malicious purposes are injected into external data sources, such as web documents. When LLMs retrieve this injected data through tools, such as a search engine and execute the injected instructions, they provide misled responses. Recent attack methods have demonstrated potential, but their abrupt instruction injection often undermines their effectiveness. Motivated by the limitations of existing attack methods, we propose TopicAttack, which prompts the LLM to generate a fabricated conversational transition prompt that gradually shifts the topic toward the injected instruction, making the injection smoother and enhancing the plausibility and success of the attack. Through comprehensive experiments, TopicAttack achieves state-of-the-art performance, with an attack success rate (ASR) over 90\% in most cases, even when various defense methods are applied. We further analyze its effectiveness by examining attention scores. We find that a higher injected-to-original attention ratio leads to a greater success probability, and our method achieves a much higher ratio than the baseline methods.

🔍 Key Points

  • Introduction of TopicAttack, a novel indirect prompt injection attack that mitigates abrupt instruction injection by generating smooth conversational transitions.
  • Demonstration of high attack success rates (ASR) over 90% even against various defense methodologies, showcasing the robustness of the proposed method.
  • Utilization of attention scores to analyze the effectiveness of the topic transition, revealing that a higher injected-to-original attention ratio correlates with the increased success of the attack.
  • Comprehensive evaluation of TopicAttack across different models (both open-source and closed-source) and scenarios (chatbots and agents), confirming its versatility and efficiency in real-world applications.
  • Identification of the contributions of specific components, such as the reminding prompt, in maintaining focus on injected instructions against defense mechanisms.

💡 Why This Paper Matters

The study presents TopicAttack, an innovative solution to a significant vulnerability in large language models. By addressing the inherent flaws in existing indirect prompt injection attacks and achieving exceptional success rates even in the presence of defenses, the paper highlights critical weaknesses in LLM security measures. Its findings not only contribute to the academic understanding of LLM vulnerabilities but also serve as a wake-up call for the development of more resilient AI systems, enhancing their security against increasingly sophisticated manipulation tactics.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant for AI security researchers as it elucidates current vulnerabilities in LLMs linked to indirect prompt injection attacks. The proposed TopicAttack not only expands the understanding of such attacks but also sets new benchmarks for their effectiveness, emphasizing the need for improved defense strategies. The insights gained from the study could guide future research aimed at fortifying AI systems against similar threats, making it a key addition to the field of AI security.

📚 Read the Full Paper