← Back to Library

Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models

Authors: Hicham Eddoubi, Umar Faruk Abdullahi, Fadi Hassan

Published: 2026-02-03

arXiv ID: 2602.03265v1

Added to Library: 2026-02-04 03:03 UTC

Red Teaming

📄 Abstract

Large Language Models (LLMs) have seen widespread adoption across multiple domains, creating an urgent need for robust safety alignment mechanisms. However, robustness remains challenging due to jailbreak attacks that bypass alignment via adversarial prompts. In this work, we focus on the prevalent Greedy Coordinate Gradient (GCG) attack and identify a previously underexplored attack axis in jailbreak attacks typically framed as suffix-based: the placement of adversarial tokens within the prompt. Using GCG as a case study, we show that both optimizing attacks to generate prefixes instead of suffixes and varying adversarial token position during evaluation substantially influence attack success rates. Our findings highlight a critical blind spot in current safety evaluations and underline the need to account for the position of adversarial tokens in the adversarial robustness evaluation of LLMs.

🔍 Key Points

  • Identified adversarial token position as a significant underexplored axis in jailbreak attacks on large language models (LLMs).
  • Showed that optimizing adversarial tokens in different positions (prefix vs. suffix) significantly affects the attack success rates (ASR) both in white-box and black-box scenarios.
  • Provided empirical evidence that fixed-position evaluations can underestimate jailbreak effectiveness, leading to potential misassessments of model safety.
  • Demonstrated that attention dynamics associated with adversarial token positions vary significantly, suggesting limitations in existing attention-based analysis frameworks.
  • Highlighted the urgent need for more comprehensive safety evaluations that include adversarial token placement considerations.

💡 Why This Paper Matters

This paper sheds light on the critical role that adversarial token position plays in the effectiveness of jailbreak attacks on LLMs. By demonstrating that traditional suffix-based optimization is not universally optimal, the authors challenge existing methodologies and propose a necessary shift in how robustness evaluations are conducted. This work is essential for improving the safety and security of LLMs, providing insights that could inform future designs of safety mechanisms.

🎯 Why It's Interesting for AI Security Researchers

This paper is of significant interest to AI security researchers due to its novel contributions to understanding jailbreak vulnerabilities in LLMs. It highlights potential weaknesses in current safety evaluation techniques and offers actionable insights that can enhance adversarial robustness assessments. By focusing on adversarial positioning, it opens up new avenues for research in model security and robustness, making it a critical addition to the field.

📚 Read the Full Paper