Soft Injection of Task Embeddings Outperforms Prompt-Based In-Context Learning

📄 Abstract

In-Context Learning (ICL) enables Large Language Models (LLMs) to perform tasks by conditioning on input-output examples in the prompt, without requiring any update in model parameters. While widely adopted, it remains unclear whether prompting with multiple examples is the most effective and efficient way to convey task information. In this work, we propose Soft Injection of task embeddings. The task embeddings are constructed only once using few-shot ICL prompts and repeatedly used during inference. Soft injection is performed by softly mixing task embeddings with attention head activations using pre-optimized mixing parameters, referred to as soft head-selection parameters. This method not only allows a desired task to be performed without in-prompt demonstrations but also significantly outperforms existing ICL approaches while reducing memory usage and compute cost at inference time. An extensive evaluation is performed across 57 tasks and 12 LLMs, spanning four model families of sizes from 4B to 70B. Averaged across 57 tasks, our method outperforms 10-shot ICL by 10.2%-14.3% across 12 LLMs. Additional analyses show that our method also serves as an insightful tool for analyzing task-relevant roles of attention heads, revealing that task-relevant head positions selected by our method transfer across similar tasks but not across dissimilar ones -- underscoring the task-specific nature of head functionality. Our soft injection method opens a new paradigm for reducing prompt length and improving task performance by shifting task conditioning from the prompt space to the activation space.

🔍 Key Points

Introduction of system prompt poisoning (SPP) as a novel attack vector against large language models (LLMs) that compromises the integrity of model outputs permanently through malicious alteration of system prompts.
Evaluation of three practical attack strategies (brute-force poisoning, adaptive in-context poisoning, adaptive chain-of-thought poisoning) demonstrating consistently severe effects across multiple reasoning and coding tasks.
Development of Auto-SPP, an automated framework for generating poisoned system prompts, showcasing the efficiency of the attack with low costs and fast execution times.
Empirical findings indicating that system prompt poisoning significantly diminishes the effectiveness of user prompts and advanced prompting techniques like chain-of-thought prompting, further emphasizing the seriousness of the vulnerability.
Discussion of defense mechanisms, including the need for integrity monitoring and conflict detection in system prompts, to mitigate the risks posed by system prompt poisoning.

💡 Why This Paper Matters

This paper is crucial in identifying a critical security vulnerability in large language models, focusing on system prompt poisoning and demonstrating its potentially devastating implications for LLM applications. The systematic evaluation of attack strategies and their effectiveness across various tasks underscores the urgent need for improved security measures, thus contributing meaningfully to the field of AI security.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers will find this paper relevant as it uncovers a significant and previously overlooked attack vector (system prompt poisoning) that can bypass existing defenses, posing a substantial risk to the integrity of AI applications. The paper not only details successful attack methodologies but also emphasizes the failings of current defenses, prompting further investigation and development of robust security protocols in AI systems.

Soft Injection of Task Embeddings Outperforms Prompt-Based In-Context Learning

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper