A test suite of prompt injection attacks for LLM-based machine translation

Authors: Antonio Valerio Miceli-Barone, Zhifan Sun

Published: 2024-10-07

arXiv ID: 2410.05047v1

Added to Library: 2025-11-11 14:30 UTC

Red Teaming

📄 Abstract

LLM-based NLP systems typically work by embedding their input data into prompt templates which contain instructions and/or in-context examples, creating queries which are submitted to a LLM, and then parsing the LLM response in order to generate the system outputs. Prompt Injection Attacks (PIAs) are a type of subversion of these systems where a malicious user crafts special inputs which interfere with the prompt templates, causing the LLM to respond in ways unintended by the system designer. Recently, Sun and Miceli-Barone proposed a class of PIAs against LLM-based machine translation. Specifically, the task is to translate questions from the TruthfulQA test suite, where an adversarial prompt is prepended to the questions, instructing the system to ignore the translation instruction and answer the questions instead. In this test suite, we extend this approach to all the language pairs of the WMT 2024 General Machine Translation task. Moreover, we include additional attack formats in addition to the one originally studied.

🔍 Key Points

Introduction of a comprehensive test suite for assessing the impact of prompt injection attacks (PIAs) on LLM-based machine translation systems in the context of WMT 2024.
Analysis of different prompt injection formats and their effectiveness in distracting machine translation systems from their intended tasks, particularly from translation to question answering.
Practical evaluation of various systems including GPLLMs, SLLMs, and online MT systems, highlighting their susceptibility to PIAs and demonstrating differences in robustness.
Identification of performance degradation under different conditions, including the use of English and non-English inputs, particularly in more complex attack formats like JSON.
Insights into the scaling behavior of LLM-based systems, revealing a positive correlation between translation quality and resistance to successful attacks.

💡 Why This Paper Matters

This paper is significant as it provides a framework for understanding and evaluating the vulnerabilities of large language models to prompt injection attacks, specifically within machine translation systems. By highlighting the mechanisms through which these attacks operate and quantifying their impact on translation quality, the research paves the way for future advancements in AI security measures that can better guard against such adversarial manipulations.

🎯 Why It's Interesting for AI Security Researchers

The findings of this paper are crucial for AI security researchers as they delve into the emerging threat of prompt injection attacks which can subvert automated systems across various applications. Understanding how different models respond to these attacks enables researchers to develop more resilient architectures, improve security protocols, and establish benchmarks for evaluating the robustness of NLP systems.

A test suite of prompt injection attacks for LLM-based machine translation

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper