Beyond Retention: Orchestrating Structural Safety and Plasticity in Continual Learning for LLMs

📄 Abstract

Continual learning in Large Language Models (LLMs) faces the critical challenge of balancing stability (retaining old knowledge) and plasticity (learning new tasks). While Experience Replay (ER) is a standard countermeasure against catastrophic forgetting, its impact across diverse capabilities remains underexplored. In this work, we uncover a critical dichotomy in ER's behavior: while it induces positive backward transfer on robust, unstructured tasks (e.g., boosting performance on previous NLP classification tasks through repeated rehearsal), it causes severe negative transfer on fragile, structured domains like code generation (e.g., a significant relative drop in coding accuracy). This reveals that ER trades structural integrity for broad consolidation. To address this dilemma, we propose \textbf{Orthogonal Subspace Wake-up (OSW)}. OSW identifies essential parameter subspaces of previous tasks via a brief "wake-up" phase and enforces orthogonal updates for new tasks, providing a mathematically grounded "safety guarantee" for established knowledge structures. Empirical results across a diverse four-task sequence demonstrate that OSW uniquely succeeds in preserving fragile coding abilities where Replay fails, while simultaneously maintaining high plasticity for novel tasks. Our findings emphasize the necessity of evaluating structural safety alongside average retention in LLM continual learning.

🔍 Key Points

Uncovering a critical dichotomy in Experience Replay (ER) effects on LLMs: positive backward transfer for robust unstructured tasks vs. negative transfer for fragile structured tasks like code generation.
Introduction of Orthogonal Subspace Wake-up (OSW), a novel method designed to maintain 'structural safety' while allowing for plasticity in continual learning.
Empirical results demonstrate that OSW successfully preserves fragile coding capabilities in contrast to ER, maintaining stability when learning new tasks without sacrificing performance on previous ones.
Analysis of the trade-offs between consolidation of robust tasks through ER and the potential decline of performance in fragile tasks, emphasizing the need for tailored continual learning strategies.
Recommendation for future CL systems to prioritize structural safety alongside performance, showcasing OSW's importance as a robust solution.

💡 Why This Paper Matters

This paper presents significant advancements in continual learning for Large Language Models (LLMs) by addressing the core challenges of balancing knowledge retention and learning adaptability. The development of Orthogonal Subspace Wake-up (OSW) provides a new framework that can effectively protect invaluable capabilities, particularly in areas sensitive to structural integrity such as code generation. The findings highlight the need for continual learning strategies that are not only performance-focused but also ensure the safety of the learned structure, making this work essential for advancing AI applications. By enhancing the robustness and flexibility of LLMs, this research opens avenues for improving AI's practical deployments across diverse domains.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper is vital as it identifies mechanisms to mitigate risks associated with catastrophic forgetting and model degradation in LLMs. The introduction of OSW presents a method to safeguard critical knowledge structures, thereby ensuring that models remain reliable and functionally sound amidst continual updates and variations in data. Understanding these mechanisms is crucial for building secure AI systems that can adapt to new information without losing essential capabilities, which is fundamental in maintaining trust and safety in AI deployments.

Beyond Retention: Orchestrating Structural Safety and Plasticity in Continual Learning for LLMs

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper