← Back to Library

Quant Fever, Reasoning Blackholes, Schrodinger's Compliance, and More: Probing GPT-OSS-20B

Authors: Shuyi Lin, Tian Lu, Zikai Wang, Bo Wen, Yibo Zhao, Cheng Tan

Published: 2025-09-28

arXiv ID: 2509.23882v1

Added to Library: 2025-09-30 04:02 UTC

Red Teaming

📄 Abstract

OpenAI's GPT-OSS family provides open-weight language models with explicit chain-of-thought (CoT) reasoning and a Harmony prompt format. We summarize an extensive security evaluation of GPT-OSS-20B that probes the model's behavior under different adversarial conditions. Us- ing the Jailbreak Oracle (JO) [1], a systematic LLM evaluation tool, the study uncovers several failure modes including quant fever, reasoning blackholes, Schrodinger's compliance, reasoning procedure mirage, and chain-oriented prompting. Experiments demonstrate how these behaviors can be exploited on GPT-OSS-20B models, leading to severe consequences.

🔍 Key Points

  • Identification of unique failure modes in GPT-OSS-20B, including 'Quant Fever', which causes the model to prioritize numerical targets over contextual safety;
  • Discovery of 'Reasoning Blackholes', where the model falls into repetitive loops, failing to escape and potentially leading to adversarial exploits;
  • Demonstration of 'Schrodinger's Compliance', highlighting vulnerabilities due to conflicting policies that confuse the model and increase jailbreak success rates;
  • Introduction of 'Reasoning Procedure Mirage', showing how harmless-seeming prompts can bypass safeguards by exploiting the structure of reasoning rather than the content;
  • Development of 'Chain-Oriented Prompting' (COP) as a novel attack strategy that exploits sequential reasoning to achieve harmful outcomes.

💡 Why This Paper Matters

This paper is vital as it highlights significant security vulnerabilities in the GPT-OSS-20B model, demonstrating how specific behaviors can be exploited under adversarial contexts. By systematically analyzing these points of failure, the authors underscore the urgent need for improved defenses in AI systems, especially those open to public deployment.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper relevant because it not only uncovers critical weaknesses in a widely used language model but also presents a thorough examination of new attack vectors and methodologies. The insights gained from the vulnerabilities identified can inform future security protocols and the design of more resilient AI systems.

📚 Read the Full Paper