← Back to Library

CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Authors: Yue Zhao, Yujia Gong, Ruigang Liang, Shenchen Zhu, Kai Chen, Xuejing Yuan, Wangjun Zhang

Published: 2026-03-19

arXiv ID: 2603.18449v1

Added to Library: 2026-03-20 03:00 UTC

Safety

📄 Abstract

The widespread deployment of large language models (LLMs) calls for post-hoc methods that can flexibly adapt models to evolving safety requirements. Meanwhile, the rapidly expanding open-source LLM ecosystem has produced a diverse collection of models that already exhibit various safety-related functionalities. This motivates a shift from constructing safety functionality from scratch to reusing existing functionality from external models, thereby avoiding costly data collection and training procedures. In this paper, we present Cross-Model Neuron Transfer (CNT), a post-hoc method that reuses safety-oriented functionality by transferring a minimal subset of neurons from an open-source donor LLM to a target LLM. By operating at the neuron level, CNT enables modular function-level adaptation, supporting both function addition andfunction deletion. We evaluate CNT on seven popular LLMs across three representative applications: safety disalignment, alignment enhancement, and bias removal. Experimental results show that CNT achieves targeted safety-oriented functionality transfer with minimal performance degradation (less than 1% for most models), consistently outperforming five baselines, demonstrating its generality and practical effectiveness.

🔍 Key Points

  • Introduction of Cross-Model Neuron Transfer (CNT) for modular safety-oriented function reuse in large language models (LLMs).
  • Demonstrated effectiveness of CNT in transferring functionality with minimal performance degradation across seven popular LLMs.
  • Proved ability to add or remove safety features flexibly by effectively using neuron-level adaptations instead of full retraining.
  • Experimental validation showed CNT's superiority over existing baselines in safety enhancement and bias reduction while preserving model utility.
  • Highlighting the broad applicability of CNT across diverse architectures and functions, establishing it as a versatile tool in LLM engineering.

💡 Why This Paper Matters

The paper presents a significant advancement in the area of LLM safety adaptation through its innovative approach to functionality reuse via neuron transfer. CNT enables cost-effective updates to LLMs, ensuring they meet evolving safety standards without the need for extensive retraining. This capability is essential as AI models are increasingly deployed in sensitive contexts where safety and bias are critical.

🎯 Why It's Interesting for AI Security Researchers

This paper is of high relevance to AI security researchers as it addresses pressing concerns regarding safety alignment and bias mitigation in LLMs. With the growth of AI deployment in various sectors, ensuring that these models maintain high safety standards while adapting to new threats is crucial. CNT represents a novel approach that can be directly applied to improve AI safety mechanisms in practice.

📚 Read the Full Paper