← Back to Library

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Authors: Xun Huang, Simeng Qin, Xiaoshuang Jia, Ranjie Duan, Huanqian Yan, Zhitao Zeng, Fei Yang, Yang Liu, Xiaojun Jia

Published: 2026-02-26

arXiv ID: 2602.22983v1

Added to Library: 2026-02-27 03:02 UTC

Red Teaming

📄 Abstract

As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are highly susceptible to jailbreak attacks, with effectiveness varying across language contexts. This paper investigates the role of classical Chinese in jailbreak attacks. Owing to its conciseness and obscurity, classical Chinese can partially bypass existing safety constraints, exposing notable vulnerabilities in LLMs. Based on this observation, this paper proposes a framework, CC-BOS, for the automatic generation of classical Chinese adversarial prompts based on multi-dimensional fruit fly optimization, facilitating efficient and automated jailbreak attacks in black-box settings. Prompts are encoded into eight policy dimensions-covering role, behavior, mechanism, metaphor, expression, knowledge, trigger pattern and context; and iteratively refined via smell search, visual search, and cauchy mutation. This design enables efficient exploration of the search space, thereby enhancing the effectiveness of black-box jailbreak attacks. To enhance readability and evaluation accuracy, we further design a classical Chinese to English translation module. Extensive experiments demonstrate that effectiveness of the proposed CC-BOS, consistently outperforming state-of-the-art jailbreak attack methods.

🔍 Key Points

  • Introduction of classical Chinese as a unique context for adversarial prompt generation in Large Language Models (LLMs), highlighting its semantically rich and ambiguous nature that can effectively bypass safety mechanisms.
  • Development of the CC-BOS framework, which uses a multi-dimensional approach to optimize jailbreak prompts based on fruit fly-inspired bio-optimization techniques.
  • Extensive empirical validation demonstrating that CC-BOS achieves a nearly 100% attack success rate across multiple models, outperforming existing state-of-the-art methods in both effectiveness and efficiency.
  • Creation of a two-stage translation module that aids in generating readable and evaluable outputs, further underscoring the importance of language context in adversarial scenarios.
  • Comprehensive assessment of the framework's effectiveness against various defense mechanisms, establishing CC-BOS as a robust tool for black-box jailbreak scenarios.

💡 Why This Paper Matters

This paper demonstrates a significant advancement in the field of AI security by exposing the vulnerabilities of LLMs when subjected to adversarial prompts designed in classical Chinese. The innovative approach presented in CC-BOS not only enhances understanding of multilingual and cross-cultural threats in AI but also provides a systematic framework for prompt optimization that could be applied to future adversarial research.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would find this paper relevant and insightful as it addresses a critical issue of LLM vulnerabilities and the efficacy of safety mechanisms across different language contexts. The exploration of classical Chinese prompts illustrates a novel method for adversarial attacks, leading to broader implications for the development of more effective AI defense systems as well as strategies for enhancing language representation in AI safety measures.

📚 Read the Full Paper