Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier
LLMs
Alexander Panfilov, Evgenii Kortukov, Kristina Nikolić, Matthias Bethge, Sebastian Lapuschkin, Wojciech Samek, Ameya Prabhu, Maksym Andriushchenko, Jonas Geiping
red teaming
safety
2509.18058v2