← Back to Library

ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected

Authors: Kanchon Gharami, Sanjiv Kumar Sarkar, Yongxin Liu, Shafika Showkat Moni

Published: 2025-12-23

arXiv ID: 2512.20405v2

Added to Library: 2026-01-07 10:08 UTC

Red Teaming

📄 Abstract

Large Language Models (LLMs) like ChatGPT are now widely used in writing and reviewing scientific papers. While this trend accelerates publication growth and reduces human workload, it also introduces serious risks. Papers written or reviewed by LLMs may lack real novelty, contain fabricated or biased results, or mislead downstream research that others depend on. Such issues can damage reputations, waste resources, and even endanger lives when flawed studies influence medical or safety-critical systems. This research explores both the offensive and defensive sides of this growing threat. On the attack side, we demonstrate how an author can inject hidden prompts inside a PDF that secretly guide or "jailbreak" LLM reviewers into giving overly positive feedback and biased acceptance. On the defense side, we propose an "inject-and-detect" strategy for editors, where invisible trigger prompts are embedded into papers; if a review repeats or reacts to these triggers, it reveals that the review was generated by an LLM, not a human. This method turns prompt injections from vulnerability into a verification tool. We outline our design, expected model behaviors, and ethical safeguards for deployment. The goal is to expose how fragile today's peer-review process becomes under LLM influence and how editorial awareness can help restore trust in scientific evaluation.

🔍 Key Points

  • The paper presents a novel hybrid attack method utilizing hidden instructional text in PDF documents to manipulate Large Language Model (LLM) reviews in academic peer review processes, highlighting vulnerabilities introduced by LLM usage in scientific publication.
  • It proposes a comprehensive two-layer defense strategy that includes structural checks for hidden content and behavioral analysis to detect anomalies in LLM behavior, enhancing the robustness of peer review systems.
  • The introduction of editor-injected traps serves as a practical tool for detecting LLM-generated reviews, providing a method for editors to differentiate between human and AI-generated feedback effectively.
  • Experimental results show that the proposed attacks can significantly shift LLM-generated review scores and decisions, while the defense mechanism is capable of achieving perfect classification in distinguishing between clean and compromised manuscripts.
  • The findings underscore the fragility of the current peer review process when leveraged by AI, stressing the necessity for improved security measures to maintain trust in scientific evaluations.

💡 Why This Paper Matters

This paper is highly relevant as it addresses the growing concerns over the reliability and integrity of academic peer review systems in the context of increasing LLM usage. By uncovering the potential vulnerabilities to manipulation and providing actionable defenses, it significantly contributes to the discourse on ensuring academic rigor and accountability in the age of AI.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper valuable as it delves into the intersection of AI and cybersecurity, specifically in peer review processes. It highlights critical vulnerabilities posed by LLMs and introduces innovative methods for detection and mitigation, providing essential insights for developing secure AI systems in sensitive applications like academic publishing.

📚 Read the Full Paper