← Back to Library

VIPAMIN: Visual Prompt Initialization via Embedding Selection and Subspace Expansion

Authors: Jaekyun Park, Hye Won Chung

Published: 2025-10-18

arXiv ID: 2510.16446v1

Added to Library: 2025-11-14 23:09 UTC

📄 Abstract

In the era of large-scale foundation models, fully fine-tuning pretrained networks for each downstream task is often prohibitively resource-intensive. Prompt tuning offers a lightweight alternative by introducing tunable prompts while keeping the backbone frozen. However, existing visual prompt tuning methods often fail to specialize the prompts or enrich the representation space--especially when applied to self-supervised backbones. We show that these limitations become especially pronounced in challenging tasks and data-scarce settings, where effective adaptation is most critical. In this work, we introduce VIPAMIN, a visual prompt initialization strategy that enhances adaptation of self-supervised models by (1) aligning prompts with semantically informative regions in the embedding space, and (2) injecting novel representational directions beyond the pretrained subspace. Despite its simplicity--requiring only a single forward pass and lightweight operations--VIPAMIN consistently improves performance across diverse tasks and dataset sizes, setting a new state of the art in visual prompt tuning. Our code is available at https://github.com/iamjaekyun/vipamin.

🔍 Key Points

  • Introduction of the Malicious Token Injection (MTI) attack framework that targets the key-value cache of transformer models during inference, revealing a significant attack surface that has been largely overlooked.
  • Theoretically quantifies the impact of cache perturbations on attention mechanisms, linking the extent of cache corruption to shifts in token distribution and downstream model performance.
  • Empirical results show that the MTI framework consistently reduces task performance across various NLP benchmarks, including classification and question answering, highlighting the vulnerability during inference.
  • Identifies specific vulnerabilities in retrieval-augmented generation systems and agentic reasoning pipelines, demonstrating that perturbations in cached representations can significantly impair their functionality.
  • Presents lightweight defense strategies such as cache resetting and dropout-mask randomization that offer partial mitigation against cache corruption, underlining the importance of cache integrity in model robustness.

💡 Why This Paper Matters

This paper is highly relevant as it introduces a novel perspective on the vulnerabilities of large language models by focusing on the key-value cache during inference. The findings stress the importance of maintaining cache integrity to ensure robust and secure LLM deployments, particularly in safety-critical applications. It paves the way for future research on defending against cache-side attacks and understanding their broader implications in model behavior and reliability.

🎯 Why It's Interesting for AI Security Researchers

For AI security researchers, this paper is of significant interest as it not only exposes a critical and previously underexplored area of vulnerability in large language models but also provides a formalized methodology for evaluating such risks. The introduction of a systematic attack framework paired with theoretical and empirical validations presents a comprehensive case for reconsidering how LLM security is approached. As AI systems become more integrated into sensitive applications, understanding and mitigating these vulnerabilities is crucial.

📚 Read the Full Paper