The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses
Against Llm Jailbreaks and Prompt Injections
Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V. Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, Florian Tramèr
red teaming
safety
2510.09023v1