← Back to Library

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Authors: Xun Huang, Simeng Qin, Xiaoshuang Jia, Ranjie Duan, Huanqian Yan, Zhitao Zeng, Fei Yang, Yang Liu, Xiaojun Jia

Published: 2026-02-26

arXiv ID: 2602.22983v2

Added to Library: 2026-03-02 03:00 UTC

Red Teaming

📄 Abstract

As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are highly susceptible to jailbreak attacks, with effectiveness varying across language contexts. This paper investigates the role of classical Chinese in jailbreak attacks. Owing to its conciseness and obscurity, classical Chinese can partially bypass existing safety constraints, exposing notable vulnerabilities in LLMs. Based on this observation, this paper proposes a framework, CC-BOS, for the automatic generation of classical Chinese adversarial prompts based on multi-dimensional fruit fly optimization, facilitating efficient and automated jailbreak attacks in black-box settings. Prompts are encoded into eight policy dimensions-covering role, behavior, mechanism, metaphor, expression, knowledge, trigger pattern and context; and iteratively refined via smell search, visual search, and cauchy mutation. This design enables efficient exploration of the search space, thereby enhancing the effectiveness of black-box jailbreak attacks. To enhance readability and evaluation accuracy, we further design a classical Chinese to English translation module. Extensive experiments demonstrate that effectiveness of the proposed CC-BOS, consistently outperforming state-of-the-art jailbreak attack methods.

🔍 Key Points

  • Introduction of classical Chinese as an effective medium for jailbreak attacks, expanding the study of adversarial prompt generation beyond modern languages.
  • Development of the CC-BOS framework utilizing a multi-dimensional strategy space for optimizing adversarial prompts in black-box settings.
  • Implementation of a bio-inspired optimization algorithm based on fruit fly behavior to efficiently explore the strategy space and enhance the effectiveness of jailbreak attacks.
  • Extensive evaluations demonstrate CC-BOS's superior attack success rates compared to existing state-of-the-art methods across various large language models (LLMs).
  • Introduction of a two-stage translation module improving the evaluation accuracy and enhancing the prompt generation process.

💡 Why This Paper Matters

This paper is relevant and important as it uncovers critical vulnerabilities in large language models (LLMs) through jailbreak attacks, specifically utilizing classical Chinese. By introducing new adversarial techniques and demonstrating their effectiveness, this research contributes to a deeper understanding of LLM security and safety alignment challenges. It also provides a foundation for future work aimed at improving defenses against such sophisticated vulnerabilities.

🎯 Why It's Interesting for AI Security Researchers

The findings and methods outlined in this paper would be of great interest to AI security researchers due to the novel approaches to adversarial prompting it presents. The effective use of a less-explored language like classical Chinese in jailbreak contexts highlights untapped avenues for attack strategies, emphasizing the need for enhanced security measures and safety net adjustments in LLMs. Researchers can leverage these insights to develop more robust defense mechanisms against evolving adversarial threats.

📚 Read the Full Paper