ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

📄 Abstract

The integration of Large Language Models (LLMs) into computer applications has introduced transformative capabilities but also significant security challenges. Existing safety alignments, which primarily focus on semantic interpretation, leave LLMs vulnerable to attacks that use non-standard data representations. This paper introduces ArtPerception, a novel black-box jailbreak framework that strategically leverages ASCII art to bypass the security measures of state-of-the-art (SOTA) LLMs. Unlike prior methods that rely on iterative, brute-force attacks, ArtPerception introduces a systematic, two-phase methodology. Phase 1 conducts a one-time, model-specific pre-test to empirically determine the optimal parameters for ASCII art recognition. Phase 2 leverages these insights to launch a highly efficient, one-shot malicious jailbreak attack. We propose a Modified Levenshtein Distance (MLD) metric for a more nuanced evaluation of an LLM's recognition capability. Through comprehensive experiments on four SOTA open-source LLMs, we demonstrate superior jailbreak performance. We further validate our framework's real-world relevance by showing its successful transferability to leading commercial models, including GPT-4o, Claude Sonnet 3.7, and DeepSeek-V3, and by conducting a rigorous effectiveness analysis against potential defenses such as LLaMA Guard and Azure's content filters. Our findings underscore that true LLM security requires defending against a multi-modal space of interpretations, even within text-only inputs, and highlight the effectiveness of strategic, reconnaissance-based attacks. Content Warning: This paper includes potentially harmful and offensive model outputs.

🔍 Key Points

Introduction of the ArtPerception framework, a two-phase black-box jailbreak attack that effectively utilizes ASCII art to bypass LLM security measures.
Empirical evaluation and comparative analysis reveal that the Modified Levenshtein Distance (MLD) metric provides a superior assessment of an LLM's recognition capabilities, facilitating optimized attack strategies.
Successful demonstrations of ArtPerception on both open-source and commercial LLMs highlight its efficiency and stealth compared to traditional iterative methods, which often require multiple interactions.
Findings underscore the necessity for LLM security frameworks to accommodate multi-modal inputs as part of robust defense mechanisms against non-semantic adversarial attacks.
The research illustrates the vulnerabilities within current safety alignments in LLMs, urging the need for more sophisticated strategies to anticipate and mitigate potential risks.

💡 Why This Paper Matters

This paper's exploration of ASCII art-based jailbreak attacks through the ArtPerception framework is pivotal for understanding and addressing critical security vulnerabilities in large language models. By demonstrating a novel methodology that effectively circumvents existing safeguards, the paper highlights the urgent need for more resilient LLM defenses and expands the discourse around AI security in a rapidly evolving technological landscape.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would find this paper particularly significant because it not only identifies a novel attack vector that exploits a gap in current LLM defenses but also provides empirical evidence of the effectiveness of such attacks. Furthermore, the proposed methods challenge conventional security measures by advocating for a multi-modal approach to threat detection and response, vital for developing robust AI systems that can withstand an array of sophisticated adversarial tactics.

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper