← Back to Library

Exposing Citation Vulnerabilities in Generative Engines

Authors: Riku Mochizuki, Shusuke Komatsu, Souta Noguchi, Kazuto Ataka

Published: 2025-10-08

arXiv ID: 2510.06823v1

Added to Library: 2025-11-14 23:14 UTC

📄 Abstract

We analyze answers generated by generative engines (GEs) from the perspectives of citation publishers and the content-injection barrier, defined as the difficulty for attackers to manipulate answers to user prompts by placing malicious content on the web. GEs integrate two functions: web search and answer generation that cites web pages using large language models. Because anyone can publish information on the web, GEs are vulnerable to poisoning attacks. Existing studies of citation evaluation focus on how faithfully answer content reflects cited sources, leaving unexamined which web sources should be selected as citations to defend against poisoning attacks. To fill this gap, we introduce evaluation criteria that assess poisoning threats using the citation information contained in answers. Our criteria classify the publisher attributes of citations to estimate the content-injection barrier thereby revealing the threat of poisoning attacks in current GEs. We conduct experiments in political domains in Japan and the United States (U.S.) using our criteria and show that citations from official party websites (primary sources) are approximately \(25\%\)--\(45\%\) in the U.S. and \(60\%\)--\(65\%\) in Japan, indicating that U.S. political answers are at higher risk of poisoning attacks. We also find that sources with low content-injection barriers are frequently cited yet are poorly reflected in answer content. To mitigate this threat, we discuss how publishers of primary sources can increase exposure of their web content in answers and show that well-known techniques are limited by language differences.

🔍 Key Points

  • Systematic evaluation of indirect prompt injection attacks on large language models (LLMs), highlighting vulnerabilities across various models and implementations.
  • Identification of key factors influencing model susceptibility including size and architecture, revealing persistent weaknesses even in advanced models.
  • Development of novel obfuscation techniques utilized in the attack scenarios, allowing adversaries to exploit models through hidden instructions embedded in seemingly benign inputs.
  • Empirical evidence showcasing varying degrees of resilience among different LLMs, with some models exhibiting alarming rates of successful attacks despite advanced security mechanisms.
  • Recommendations for improving LLM defenses, including a centralized database of attack vectors and the integration of security into model training processes.

💡 Why This Paper Matters

This paper is critically relevant in addressing the emerging threats posed by indirect prompt injection attacks on LLMs, underscoring the necessity for enhanced security frameworks in AI applications. The findings not only highlight significant vulnerabilities in existing models but also provide a structured approach for future developments in AI security protocols, making it a pivotal resource for safeguarding corporate data against unprecedented threats.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper offers crucial insights into the evolving landscape of vulnerabilities associated with LLMs, particularly in the context of their integration with external data sources. The empirical data on attack success rates, coupled with a comprehensive analysis of obfuscation techniques, serves as a foundational study for understanding and mitigating security threats in generative AI systems. Furthermore, the proposed frameworks and defensive strategies can guide researchers in developing robust countermeasures against increasingly sophisticated adversarial tactics.

📚 Read the Full Paper