Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 1169 papers total

January 26 - February 01, 2026

17 papers

FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks

Naen Xu, Jinghuai Zhang, Ping He, Chunyi Zhou, Jun Wang, Zhihui Fu, Tianyu Du, Zhaoxiang Wang, Shouling Ji
2026-01-30
safety
2601.22485v1

Jailbreaks on Vision Language Model via Multimodal Reasoning

Aarush Noheria, Yuguang Yao
2026-01-29
red teaming
2601.22398v1

Hair-Trigger Alignment: Black-Box Evaluation Cannot Guarantee Post-Update Alignment

Yavuz Bakman, Duygu Nur Yaldiz, Salman Avestimehr, Sai Praneeth Karimireddy
2026-01-29
red teaming
2601.22313v1

A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy

Pedro H. Barcha Correia, Ryan W. Achjian, Diego E. G. Caetano de Oliveira, Ygor Acacio Maria, Victor Takashi Hayashi, Marcos Lopes, Charles Christian Miers, Marcos A. Simplicio
2026-01-29
safety
2601.22240v1

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Jarrod Barnes
2026-01-28
red teaming
2601.21083v3

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Jarrod Barnes
2026-01-28
red teaming
2601.21083v2

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Jarrod Barnes
2026-01-28
red teaming
2601.21083v1

ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack

Xingwei Lin, Wenhao Lin, Sicong Cao, Jiahao Yu, Renke Huang, Lei Xue, Chunming Wu
2026-01-28
red teaming
2601.20903v1

RvB: Automating AI System Hardening via Iterative Red-Blue Games

Lige Huang, Zicheng Liu, Jie Zhang, Lewen Yan, Dongrui Liu, Jing Shao
2026-01-27
red teaming
2601.19726v1

LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment

Haonan Zhang, Dongxia Wang, Yi Liu, Kexin Chen, Wenhai Wang
2026-01-27
2601.19487v1

SHIELD: An Auto-Healing Agentic Defense Framework for LLM Resource Exhaustion Attacks

Nirhoshan Sivaroopan, Kanchana Thilakarathna, Albert Zomaya, Manu, Yi Guo, Jo Plested, Tim Lynar, Jack Yang, Wangli Yang
2026-01-27
safety
2601.19174v1

Proactive Hardening of LLM Defenses with HASTE

Henry Chen, Victor Aranda, Samarth Keshari, Ryan Heartfield, Nicole Nichols
2026-01-27
red teaming safety
2601.19051v1

Malicious Repurposing of Open Science Artefacts by Using Large Language Models

Zahra Hashemi, Zhiqiang Zhong, Jun Pang, Wei Zhao
2026-01-26
red teaming
2601.18998v1

TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment

Zhewen Tan, Wenhan Yu, Jianfeng Si, Tongxin Liu, Kaiqi Guan, Huiyan Jin, Jiawen Tao, Xiaokun Yuan, Duohe Ma, Xiangzheng Zhang, Tong Yang, Lin Sun
2026-01-26
safety
2601.18292v1

Beyond Retention: Orchestrating Structural Safety and Plasticity in Continual Learning for LLMs

Fei Meng
2026-01-26
safety
2601.18255v1

From Transcripts to AI Agents: Knowledge Extraction, RAG Integration, and Robust Evaluation of Conversational AI Assistants

Krittin Pachtrachai, Petmongkon Pornpichitsuwan, Wachiravit Modecrua, Touchapon Kraisingkorn
2026-01-26
red teaming
2602.15859v1

Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming

Alexandra Chouldechova, A. Feder Cooper, Solon Barocas, Abhinav Palia, Dan Vann, Hanna Wallach
2026-01-26
red teaming
2601.18076v1

January 19 - January 25, 2026

7 papers