← Back to Library

CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection

Authors: Jiaming Hu, Haoyu Wang, Debarghya Mukherjee, Ioannis Ch. Paschalidis

Published: 2025-08-19

arXiv ID: 2508.14128v1

Added to Library: 2025-08-21 04:01 UTC

Red Teaming Safety

πŸ“„ Abstract

Jailbreak attacks pose a serious challenge to the safe deployment of large language models (LLMs). We introduce CCFC (Core & Core-Full-Core), a dual-track, prompt-level defense framework designed to mitigate LLMs' vulnerabilities from prompt injection and structure-aware jailbreak attacks. CCFC operates by first isolating the semantic core of a user query via few-shot prompting, and then evaluating the query using two complementary tracks: a core-only track to ignore adversarial distractions (e.g., toxic suffixes or prefix injections), and a core-full-core (CFC) track to disrupt the structural patterns exploited by gradient-based or edit-based attacks. The final response is selected based on a safety consistency check across both tracks, ensuring robustness without compromising on response quality. We demonstrate that CCFC cuts attack success rates by 50-75% versus state-of-the-art defenses against strong adversaries (e.g., DeepInception, GCG), without sacrificing fidelity on benign queries. Our method consistently outperforms state-of-the-art prompt-level defenses, offering a practical and effective solution for safer LLM deployment.

πŸ” Key Points

  • Introduction of CCFC, a dual-track defense framework aimed at mitigating threats from jailbreak attacks on LLMs.
  • CCFC implements two parallel processing tracks: a core-only track to filter out distractions and a core-full-core track to preserve contextual integrity.
  • Demonstrated effectiveness of CCFC, achieving a 50-75% reduction in attack success rates against state-of-the-art methods without sacrificing response quality on benign queries.
  • CCFC's method is based on few-shot prompting for semantic core extraction, enhancing robustness against adversarially crafted jailbreak prompts.
  • Empirical results from experiments showcase CCFC's superior performance across various models and types of attacks, establishing it as a practical defense mechanism.

πŸ’‘ Why This Paper Matters

The CCFC framework provides a significant advancement in the safe deployment of large language models by effectively countering jailbreak attacks while maintaining the quality of responses. Its nuanced approach, focusing on both the semantic core of queries and the structural patterns of prompts, showcases a promising path for enhancing AI security protocols in real-world applications.

🎯 Why It's Interesting for AI Security Researchers

This paper would be of great interest to AI security researchers as it addresses a critical vulnerability in LLMsβ€”the susceptibility to jailbreak attacks. By presenting a novel and empirically validated defense mechanism, CCFC offers insights into dual-track strategies that can be essential for developing robust models. The proposed methodology not only reduces the risk of malicious exploitation but also ensures the functional integrity of LLMs, thereby guiding future research in prompt-level defenses and AI safety.

πŸ“š Read the Full Paper