← Back to Library

When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review

Authors: Changjia Zhu, Junjie Xiong, Renkai Ma, Zhicong Lu, Yao Liu, Lingyao Li

Published: 2025-09-12

arXiv ID: 2509.09912v1

Added to Library: 2025-11-11 14:33 UTC

Red Teaming

📄 Abstract

Peer review is the cornerstone of academic publishing, yet the process is increasingly strained by rising submission volumes, reviewer overload, and expertise mismatches. Large language models (LLMs) are now being used as "reviewer aids," raising concerns about their fairness, consistency, and robustness against indirect prompt injection attacks. This paper presents a systematic evaluation of LLMs as academic reviewers. Using a curated dataset of 1,441 papers from ICLR 2023 and NeurIPS 2022, we evaluate GPT-5-mini against human reviewers across ratings, strengths, and weaknesses. The evaluation employs structured prompting with reference paper calibration, topic modeling, and similarity analysis to compare review content. We further embed covert instructions into PDF submissions to assess LLMs' susceptibility to prompt injection. Our findings show that LLMs consistently inflate ratings for weaker papers while aligning more closely with human judgments on stronger contributions. Moreover, while overarching malicious prompts induce only minor shifts in topical focus, explicitly field-specific instructions successfully manipulate specific aspects of LLM-generated reviews. This study underscores both the promises and perils of integrating LLMs into peer review and points to the importance of designing safeguards that ensure integrity and trust in future review processes.

🔍 Key Points

  • Systematic evaluation of GPT-5-mini as an academic peer reviewer against human reviewers;
  • LLMs tend to inflate ratings for weaker papers, indicating a bias in LLM-generated reviews;
  • Embedded prompts significantly affect LLM reviews, showing risks of prompt injection attacks;
  • Moderate divergence exists between LLMs and humans in identifying strengths/weaknesses of research papers;
  • Policy implications are suggested for regulating LLM-assisted peer review to mitigate biases and safeguard integrity.

💡 Why This Paper Matters

This paper is essential for understanding the implications of integrating large language models in academic peer reviews, especially regarding their biases, performance discrepancies compared to human reviewers, and vulnerability to manipulation through prompt injection. As LLM usage becomes more widespread in academic settings, addressing these critical issues is vital to maintain the integrity of the peer review process.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper relevant due to its exploration of prompt injection vulnerabilities in LLMs, which poses significant risks in automated systems. The detailed analysis of how bias can be introduced and the potential for adversarial attacks highlights important considerations for securing AI applications in sensitive domains like academic publishing.

📚 Read the Full Paper