← Back to Library

Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications

Authors: Janis Keuper

Published: 2025-09-12

arXiv ID: 2509.10248v3

Added to Library: 2025-11-11 14:29 UTC

Red Teaming

📄 Abstract

The ongoing intense discussion on rising LLM usage in the scientific peer-review process has recently been mingled by reports of authors using hidden prompt injections to manipulate review scores. Since the existence of such "attacks" - although seen by some commentators as "self-defense" - would have a great impact on the further debate, this paper investigates the practicability and technical success of the described manipulations. Our systematic evaluation uses 1k reviews of 2024 ICLR papers generated by a wide range of LLMs shows two distinct results: I) very simple prompt injections are indeed highly effective, reaching up to 100% acceptance scores. II) LLM reviews are generally biased toward acceptance (>95% in many models). Both results have great impact on the ongoing discussions on LLM usage in peer-review.

🔍 Key Points

  • Describes the vulnerability of LLM reviews to manipulative prompt injections, showing high effectiveness with acceptance rates up to 100%.
  • Finds that reviews generated by LLMs exhibit a general bias towards acceptance, with many models indicating acceptance rates above 95%.
  • Provides a systematic evaluation using 1,000 reviews from ICLR 2024, including various LLMs and different prompt injection strategies.
  • Concludes that even without manipulative injections, LLMs tend to produce overly positive reviews, raising concerns about their reliability in peer-review contexts.
  • Offers insights into potential defense mechanisms, such as parsing PDFs as images to prevent the reading of hidden prompts.

💡 Why This Paper Matters

This paper is crucial as it highlights significant vulnerabilities and biases in the use of large language models for scientific peer review processes. It brings to light the ease with which authors can manipulate review outcomes using prompt injections, which raises ethical concerns and calls for immediate discussions on regulating LLM usage in academic settings.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant to AI security researchers as it identifies weaknesses in LLMs that can be exploited for manipulative purposes, specifically in high-stakes settings like scientific peer review. Understanding and mitigating these vulnerabilities is essential for the development of secure and trustworthy AI systems that are used in academic and research environments.

📚 Read the Full Paper