Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1169 papers total

December 29 - January 04, 2026

10 papers

CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns

Zhenhong Zhou, Shilinlu Yan, Chuanpu Liu, Qiankun Li, Kun Wang, Zhigang Zeng
2026-01-02
safety
2601.00588v2

Defensive M2S: Training Guardrail Models on Compressed Multi-turn Conversations

Hyunjun Kim
2026-01-01
2601.00454v1

$α^3$-Bench: A Unified Benchmark of Safety, Robustness, and Efficiency for LLM-Based UAV Agents over 6G Networks

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah
2026-01-01
safety
2601.03281v1

Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak

Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang, Yaochu Jin
2026-01-01
red teaming safety
2601.00213v1

Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service

Jingyu Zhang
2025-12-30
red teaming
2512.24415v1

The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models

Giuseppe Canale, Kashyap Thimmaraju
2025-12-30
red teaming
2601.00867v1

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

Panagiotis Theocharopoulos, Ajinkya Kulkarni, Mathew Magimai. -Doss
2025-12-29
red teaming
2512.23684v1

Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks

Toqeer Ali Syed, Mishal Ateeq Almutairi, Mahmoud Abdel Moaty
2025-12-29
2512.23557v1

Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems

Armstrong Foundjem, Lionel Nganyewou Tidjon, Leuson Da Silva, Foutse Khomh
2025-12-29
red teaming
2512.23132v1

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki, Piotr Błaszczyk, Will Howard, Lukas Aichberger, Chris Russell, Philip H. S. Torr, Adam Mahdi, Adel Bibi
2025-12-29
red teaming
2512.23128v1

December 22 - December 28, 2025

7 papers

December 15 - December 21, 2025

7 papers