← Back to Library

Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO

Authors: Nikolay Blagoev, Oğuzhan Ersoy, Lydia Yiyu Chen

Published: 2025-11-12

arXiv ID: 2511.09780v1

Added to Library: 2025-11-14 23:01 UTC

Red Teaming

πŸ“„ Abstract

Group Relative Policy Optimization (GRPO) has demonstrated great utilization in post-training of Large Language Models (LLMs). In GRPO, prompts are answered by the model and, through reinforcement learning, preferred completions are learnt. Owing to the small communication volume, GRPO is inherently suitable for decentralised training as the prompts can be concurrently answered by multiple nodes and then exchanged in the forms of strings. In this work, we present the first adversarial attack in decentralised GRPO. We demonstrate that malicious parties can poison such systems by injecting arbitrary malicious tokens in benign models in both out-of-context and in-context attacks. Using empirical examples of math and coding tasks, we show that adversarial attacks can easily poison the benign nodes, polluting their local LLM post-training, achieving attack success rates up to 100% in as few as 50 iterations. We propose two ways to defend against these attacks, depending on whether all users train the same model or different models. We show that these defenses can achieve stop rates of up to 100%, making the attack impossible.

πŸ” Key Points

  • Introduction of adversarial attacks in decentralised Group Relative Policy Optimization (GRPO) for Large Language Models (LLMs).
  • Demonstration of two types of attacks: in-context and out-of-context attacks resulting in high success rates for the poisoning of benign models.
  • Development of two defense strategies targeting homogeneous and heterogeneous model setups that achieve up to 100% effectiveness in stopping attacks.
  • Empirical evaluations using real-world math and coding tasks illustrate the practical impact of the proposed attacks and defenses.
  • Discussion of the limitations of existing defenses and suggestions for future research directions in enhancing model robustness.

πŸ’‘ Why This Paper Matters

This paper is relevant and important as it highlights significant vulnerabilities within decentralised GRPO systems, which are increasingly used for post-training LLMs. By presenting novel adversarial attacks and effective countermeasures, it sets a foundation for further exploration of robustness in AI systems, ultimately contributing to more secure and reliable deployments of language models in real-world applications.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of great interest to AI security researchers because it uncovers critical weaknesses in current decentralized RL approaches, specifically in the context of LLM training. The introduction of novel attack vectors and the corresponding defenses provides a fresh perspective on adversarial machine learning, raising awareness and prompting deeper investigation into AI security challenges associated with collaborative training environments.

πŸ“š Read the Full Paper