← Back to Library

Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?

Authors: Junjie Mu, Zonghao Ying, Zhekui Fan, Zonglei Jing, Yaoyuan Zhang, Zhengmin Yu, Wenxin Zhang, Quanchen Zou, Xiangzheng Zhang

Published: 2025-09-08

arXiv ID: 2509.06350v1

Added to Library: 2025-09-09 04:00 UTC

Red Teaming

📄 Abstract

Jailbreak attacks on Large Language Models (LLMs) have demonstrated various successful methods whereby attackers manipulate models into generating harmful responses that they are designed to avoid. Among these, Greedy Coordinate Gradient (GCG) has emerged as a general and effective approach that optimizes the tokens in a suffix to generate jailbreakable prompts. While several improved variants of GCG have been proposed, they all rely on fixed-length suffixes. However, the potential redundancy within these suffixes remains unexplored. In this work, we propose Mask-GCG, a plug-and-play method that employs learnable token masking to identify impactful tokens within the suffix. Our approach increases the update probability for tokens at high-impact positions while pruning those at low-impact positions. This pruning not only reduces redundancy but also decreases the size of the gradient space, thereby lowering computational overhead and shortening the time required to achieve successful attacks compared to GCG. We evaluate Mask-GCG by applying it to the original GCG and several improved variants. Experimental results show that most tokens in the suffix contribute significantly to attack success, and pruning a minority of low-impact tokens does not affect the loss values or compromise the attack success rate (ASR), thereby revealing token redundancy in LLM prompts. Our findings provide insights for developing efficient and interpretable LLMs from the perspective of jailbreak attacks.

🔍 Key Points

  • Introduction of Mask-GCG: A method employing learnable token masking that identifies and prunes low-impact tokens in adversarial suffixes, improving the efficiency of jailbreak attacks on LLMs.
  • Demonstrated significant token redundancy in fixed-length suffixes generated by existing methods like Greedy Coordinate Gradient (GCG), revealing that not all tokens contribute equally to attack success.
  • Empirical results show that Mask-GCG maintains or improves Attack Success Rates (ASR) while achieving substantial reductions in computational resources and suffix lengths, confirming the validity of the proposed pruning approach.
  • The approach highlights a hierarchy of token importance within adversarial prompts, indicating that over 83% of tokens can be classified as high-impact without jeopardizing the effectiveness of the attack.
  • The findings contribute to the development of more interpretable and efficient large language models by illuminating how adversarial prompts can be optimized.

💡 Why This Paper Matters

The paper presents a significant advancement in the security analysis of large language models by revealing token redundancy within adversarial suffixes used in jailbreak attacks and proposing a novel optimization technique, Mask-GCG. This contributes to a deeper understanding of the vulnerabilities of LLMs and offers practical solutions for improving attack efficiency. Moreover, Mask-GCG's ability to streamline computational processes while retaining or improving attack effectiveness makes it a useful tool for researchers in AI security.

🎯 Why It's Interesting for AI Security Researchers

This paper is highly relevant for AI security researchers interested in understanding the vulnerabilities of large language models, particularly in the context of adversarial attacks. By demonstrating the potential for optimizing adversarial prompts through the use of learnable token masking, researchers can explore more effective and efficient methods for enhancing model safety and security. Additionally, the insights gained from the study could inform the development of better defenses against jailbreak attacks, ultimately contributing to the creation of more robust AI systems.

📚 Read the Full Paper