XRPO: Pushing the limits of GRPO with Targeted Exploration and Exploitation

📄 Abstract

Reinforcement learning algorithms such as GRPO have driven recent advances in large language model (LLM) reasoning. While scaling the number of rollouts stabilizes training, existing approaches suffer from limited exploration on challenging prompts and leave informative feedback signals underexploited, due to context-independent rollout allocation across prompts (e.g., generating 16 rollouts per prompt) and relying heavily on sparse rewards. This paper presents XRPO(eXplore - eXploit GRPO), a unified framework that recasts policy optimization through the principled lens of rollout exploration-exploitation. To enhance exploration, XRPO introduces a mathematically grounded rollout allocator that adaptively prioritizes prompts with higher potential for uncertainty reduction. It further addresses stagnation on zero-reward prompts through an in-context seeding strategy that injects curated exemplars, steering the model into more difficult reasoning trajectories. To strengthen exploitation, XRPO develops a group-relative, novelty-aware advantage sharpening mechanism that leverages sequence likelihoods to amplify low-probability yet correct responses, thereby extending the policy's reach beyond sparse rewards. Experiments across diverse math and coding benchmarks on both reasoning and non-reasoning models demonstrate that XRPO outperforms existing advances (e.g., GRPO and GSPO) up to 4% pass@1 and 6% cons@32, while accelerating training convergence by up to 2.7X.

🔍 Key Points

Systematic evaluation of indirect prompt injection attacks on large language models (LLMs), highlighting vulnerabilities across various models and implementations.
Identification of key factors influencing model susceptibility including size and architecture, revealing persistent weaknesses even in advanced models.
Development of novel obfuscation techniques utilized in the attack scenarios, allowing adversaries to exploit models through hidden instructions embedded in seemingly benign inputs.
Empirical evidence showcasing varying degrees of resilience among different LLMs, with some models exhibiting alarming rates of successful attacks despite advanced security mechanisms.
Recommendations for improving LLM defenses, including a centralized database of attack vectors and the integration of security into model training processes.

💡 Why This Paper Matters

This paper is critically relevant in addressing the emerging threats posed by indirect prompt injection attacks on LLMs, underscoring the necessity for enhanced security frameworks in AI applications. The findings not only highlight significant vulnerabilities in existing models but also provide a structured approach for future developments in AI security protocols, making it a pivotal resource for safeguarding corporate data against unprecedented threats.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper offers crucial insights into the evolving landscape of vulnerabilities associated with LLMs, particularly in the context of their integration with external data sources. The empirical data on attack success rates, coupled with a comprehensive analysis of obfuscation techniques, serves as a foundational study for understanding and mitigating security threats in generative AI systems. Furthermore, the proposed frameworks and defensive strategies can guide researchers in developing robust countermeasures against increasingly sophisticated adversarial tactics.

XRPO: Pushing the limits of GRPO with Targeted Exploration and Exploitation

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper