The Vulnerability of LLM Rankers to Prompt Injection Attacks

📄 Abstract

Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter an LLM's ranking decisions. While this poses serious security risks to LLM-based ranking pipelines, the extent to which this vulnerability persists across diverse LLM families, architectures, and settings remains largely under-explored. In this paper, we present a comprehensive empirical study of jailbreak prompt attacks against LLM rankers. We focus our evaluation on two complementary tasks: (1) Preference Vulnerability Assessment, measuring intrinsic susceptibility via attack success rate (ASR); and (2) Ranking Vulnerability Assessment, quantifying the operational impact on the ranking's quality (nDCG@10). We systematically examine three prevalent ranking paradigms (pairwise, listwise, setwise) under two injection variants: decision objective hijacking and decision criteria hijacking. Beyond reproducing prior findings, we expand the analysis to cover vulnerability scaling across model families, position sensitivity, backbone architectures, and cross-domain robustness. Our results characterize the boundary conditions of these vulnerabilities, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks. We publicly release our code and additional experimental results at https://github.com/ielab/LLM-Ranker-Attack.

🔍 Key Points

Conducted an empirical study of jailbreak attacks against LLM rankers across various model families and architectures, highlighting intrinsic preference vulnerabilities and operational ranking impacts.
Demonstrated that larger LLM architectures generally exhibit greater susceptibility to prompt injection attacks, confirming and extending prior findings with a detailed scaling analysis.
Explored the significance of prompt placement, identifying that backend injections typically yield more severe rank disruptions compared to frontend placements, which informs strategies to improve LLM robustness.
Showcased architectural differences, particularly the resilience of encoder-decoder models (like Flan-T5) against attacks, suggesting avenues for developing more secure LLM based ranking systems.
Investigated cross-domain robustness, indicating that vulnerabilities persist across varying domains and that susceptibility is a fundamental property of LLMs, influenced by dataset characteristics.

💡 Why This Paper Matters

The paper provides critical insights into the vulnerabilities of LLM rankers, emphasizing the security risks posed by prompt injection attacks. By systematically evaluating the factors that influence these vulnerabilities, the authors offer valuable guidance on constructing more resilient LLM systems. The significance of understanding how varying architectures and model sizes affect susceptibility can lead to more secure AI applications, enhancing trust and reliability in systems that employ LLMs for ranking or retrieval tasks.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant to AI security researchers as it addresses the emerging security threats associated with LLMs in information retrieval and ranking systems. The investigation into prompt injection vulnerabilities highlights the potential for adversarial manipulation, which is a significant concern for systems relying on LLMs. The findings provide a foundation for developing protective measures and strategies, making it essential reading for researchers focused on the robustness and security of AI technologies.

The Vulnerability of LLM Rankers to Prompt Injection Attacks

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper