Extracting books from production language models

📄 Abstract

Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model's weights during training, and whether those memorized data can be extracted in the model's outputs. While many believe that LLMs do not memorize much of their training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models. However, it remains an open question if similar extraction is feasible for production LLMs, given the safety measures these systems implement. We investigate this question using a two-phase procedure: (1) an initial probe to test for extraction feasibility, which sometimes uses a Best-of-N (BoN) jailbreak, followed by (2) iterative continuation prompts to attempt to extract the book. We evaluate our procedure on four production LLMs -- Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3 -- and we measure extraction success with a score computed from a block-based approximation of longest common substring (nv-recall). With different per-LLM experimental configurations, we were able to extract varying amounts of text. For the Phase 1 probe, it was unnecessary to jailbreak Gemini 2.5 Pro and Grok 3 to extract text (e.g, nv-recall of 76.8% and 70.3%, respectively, for Harry Potter and the Sorcerer's Stone), while it was necessary for Claude 3.7 Sonnet and GPT-4.1. In some cases, jailbroken Claude 3.7 Sonnet outputs entire books near-verbatim (e.g., nv-recall=95.8%). GPT-4.1 requires significantly more BoN attempts (e.g., 20X), and eventually refuses to continue (e.g., nv-recall=4.0%). Taken together, our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs.

🔍 Key Points

The study demonstrates the feasibility of extracting large amounts of copyrighted book content from four leading production LLMs through a two-phase extraction method, comprising a probing phase followed by iterative prompting for continued text generation.
The paper reveals significant differences in the extraction capabilities of various LLMs , with Claude 3.7 Sonnet allowing near-verbatim extraction of whole books, while GPT-4.1 necessitating numerous attempts and eventually resulting in low extraction success.
Extraction success is quantitatively measured via the nv-recall metric, which assesses the volume of near-verbatim text generated relative to the original text, highlighting the potential risks of copyright infringement with LLMs even under safety measures.
The authors highlight the persistent issue of LLM memorizing copyrighted material despite the implementation of safety features, posing legal challenges and posing risks regarding copyright infringement in generative AI.
The procedure used in this paper for extraction through jailbreaking and best-of-N perturbations contributes to ongoing discussions about model safety and accountability in AI, emphasizing the risks of model outputs reflecting verbatim copyrighted material.

💡 Why This Paper Matters

This paper is crucial for understanding the risks associated with production large language models in the context of copyright law. It provides empirical evidence that, contrary to popular belief, substantial portions of copyrighted text can indeed be extracted from these models, raising significant concerns about the legality and ethical implications of LLM training practices. The findings call for more robust safeguards to protect intellectual property in the AI landscape.

🎯 Why It's Interesting for AI Security Researchers

This research is highly relevant for AI security researchers as it addresses the critical issue of data leakage and copyright infringement through LLMs. The methods employed for extracting copyrighted text highlight vulnerabilities in existing safety measures and prompt important discussions about the ethical use of AI technology. Furthermore, understanding these extraction techniques is vital for developing countermeasures to mitigate potential risks associated with unauthorized data access and to enhance the security protocols governing the deployment of AI systems.

Extracting books from production language models

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper