Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1331 papers total

October 20 - October 26, 2025

21 papers

RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models

Yang Yang, Hua XU, Zhangyi Hu, Yutao Yue
2025-10-22
2510.19698v1

SORA-ATMAS: Adaptive Trust Management and Multi-LLM Aligned Governance for Future Smart Cities

Usama Antuley, Shahbaz Siddiqui, Sufian Hameed, Waqas Arif, Subhan Shah, Syed Attique Shah
2025-10-22
governance
2510.19327v1

Defending Against Prompt Injection with DataFilter

Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, David Wagner
2025-10-22
red teaming
2510.19207v1

OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

Thomas Wang, Haowen Li
2025-10-22
red teaming
2510.19169v2

OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform

Thomas Wang, Haowen Li
2025-10-22
red teaming
2510.19169v1

Evidence of Energy Injection in the Short and Distant GRB 250221A

Camila Angulo-Valdez, Rosa L. Becerra, Ramandeep Gill, Noémie Globus, William H. Lee, Diego López-Cámara, Cassidy Mihalenko, Enrique Moreno-Méndez, Roberto Ricci, Karelle Siellez, Alan M. Watson, Muskan Yadav, Yu-han Yang, Dalya Akl, Sarah Antier, Jean-Luc Atteia, Stéphane Basa, Nathaniel R. Butler, Simone Dichiara, Damien Dornic, Jean-Grégoire Ducoin, Francis Fortin, Leonardo García-García, Kin Ocelotl López, Francesco Magnani, Brendan O'Connor, Margarita Pereyra, Ny Avo Rakotondrainibe, Fredd Sánchez-Álvarez, Benjamin Schneider, Eleonora Troja, Antonio de Ugarte Postigo
2025-10-21
2510.19132v4

Steering Autoregressive Music Generation with Recursive Feature Machines

Daniel Zhao, Daniel Beaglehole, Taylor Berg-Kirkpatrick, Julian McAuley, Zachary Novack
2025-10-21
2510.19127v1

HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

Sidhant Narula, Javad Rafiei Asl, Mohammad Ghasemigol, Eduardo Blanco, Daniel Takabi
2025-10-21
red teaming
2510.18728v1

Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation

Giovanni De Muri, Mark Vero, Robin Staab, Martin Vechev
2025-10-21
red teaming
2510.18541v1

SegTune: Structured and Fine-Grained Control for Song Generation

Pengfei Cai, Joanna Wang, Haorui Zheng, Xu Li, Zihao Ji, Teng Ma, Zhongliang Liu, Chen Zhang, Pengfei Wan
2025-10-21
2510.18416v1

Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth

Jiawei Zhang, Andrew Estornell, David D. Baek, Bo Li, Xiaojun Xu
2025-10-20
safety
2510.18081v1

CourtGuard: A Local, Multiagent Prompt Injection Classifier

Isaac Wu, Michael Maslowski
2025-10-20
red teaming
2510.19844v1

PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits

Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar
2025-10-20
red teaming
2510.17947v2

PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits

Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar
2025-10-20
red teaming
2510.17947v1

VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models

Qilin Liao, Anamika Lochab, Ruqi Zhang
2025-10-20
red teaming
2510.17759v1

CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks

Xu Zhang, Hao Li, Zhichao Lu
2025-10-20
red teaming
2510.17687v1

From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors

Zhengshen Zhang, Hao Li, Yalun Dai, Zhengbang Zhu, Lei Zhou, Chenchen Liu, Dong Wang, Francis E. H. Tay, Sijin Chen, Ziwei Liu, Yuxiao Liu, Xinghang Li, Pan Zhou
2025-10-20
2510.17439v1

Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks

Xinkai Wang, Beibei Li, Zerui Shao, Ao Liu, Shouling Ji
2025-10-20
red teaming
2510.17277v1

JT-Safe: Intrinsically Enhancing the Safety and Trustworthiness of LLMs

Junlan Feng, Fanyu Meng, Chong Long, Pengyu Cong, Duqing Wang, Yan Zheng, Yuyao Zhang, Xuanchang Gao, Ye Yuan, Yunfei Ma, Zhijie Ren, Fan Yang, Na Wu, Di Jin, Chao Deng
2025-10-20
safety
2510.17918v1

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

Elias Hossain, Swayamjit Saha, Somshubhra Roy, Ravi Prasad
2025-10-20
red teaming
2510.17098v1

Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation

Guoqing Luo, Iffat Maab, Lili Mou, Junichi Yamagishi
2025-10-20
2510.17062v1

October 13 - October 19, 2025

3 papers