← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 770 papers total
December 01 - December 07, 2025
24 papers
Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems
M Zeeshan, Saud Satti
2025-12-04
red teaming
2512.04895v1
STELLA: Guiding Large Language Models for Time Series Forecasting with Semantic Abstractions
Junjie Fan, Hongye Zhao, Linduo Wei, Jiayu Rao, Guijia Li, Jiaxin Yuan, Wenqi Xu, Yong Qi
2025-12-04
2512.04871v1
SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security
Wei Zhao, Zhe Li, Jun Sun
2025-12-04
red teaming
2512.04841v1
ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications
Eranga Bandara, Amin Hass, Ross Gore, Sachin Shetty, Ravi Mukkamala, Safdar H. Bouk, Xueping Liang, Ng Wee Keong, Kasun De Zoysa, Aruna Withanage, Nilaan Loganathan
2025-12-04
red teaming
2512.04785v1
Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image Segmentation
Chenlin Xu, Lei Zhang, Lituan Wang, Xinyu Pu, Pengfei Ma, Guangwu Qian, Zizhou Wang, Yan Wang
2025-12-04
2512.04520v1
GovBench: Benchmarking LLM Agents for Real-World Data Governance Workflows
Zhou Liu, Zhaoyang Han, Guochen Yan, Hao Liang, Bohan Zeng, Xing Chen, Yuanfeng Song, Wentao Zhang
2025-12-04
governance
2512.04416v1
Executable Governance for AI: Translating Policies into Rules Using LLMs
Gautam Varma Datla, Anudeep Vurity, Tejaswani Dash, Tazeem Ahmad, Mohd Adnan, Saima Rafi
2025-12-04
governance
2512.04408v1
Multi-Scale Visual Prompting for Lightweight Small-Image Classification
Salim Khazem
2025-12-03
2512.03663v1
Immunity memory-based jailbreak detection: multi-agent adaptive guard for large language models
Jun Leng, Litian Zhang, Xi Zhang
2025-12-03
red teaming
2512.03356v1
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
Qinghe Wang, Xiaoyu Shi, Baolu Li, Weikang Bian, Quande Liu, Huchuan Lu, Xintao Wang, Pengfei Wan, Kun Gai, Xu Jia
2025-12-02
2512.03041v1
Invasive Context Engineering to Control Large Language Models
Thomas Rivasseau
2025-12-02
2512.03001v1
Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities
Yuan Xiong, Ziqi Miao, Lijun Li, Chen Qian, Jie Li, Jing Shao
2025-12-02
red teaming
2512.02973v1
When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models
Afshin Khadangi, Hanna Marxen, Amir Sartipi, Igor Tchappi, Gilbert Fridgen
2025-12-02
2512.04124v1
Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions
Piercosma Bisconti, Marcello Galisai, Federico Pierucci, Marcantonio Bracale, Matteo Prandi
2025-12-02
safety
2512.02682v1
When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents
Tsimur Hadeliya, Mohammad Ali Jauhar, Nidhi Sakpal, Diogo Cruz
2025-12-02
safety
2512.02445v1
COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers
Junyu Wang, Changjia Zhu, Yuanbo Zhou, Lingyao Li, Xu He, Junjie Xiong
2025-12-02
safety
2512.02318v2
COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers
Junyu Wang, Changjia Zhu, Yuanbo Zhou, Lingyao Li, Xu He, Junjie Xiong
2025-12-02
safety
2512.02318v1
DialogGuard: Multi-Agent Psychosocial Safety Evaluation of Sensitive LLM Responses
Han Luo, Guy Laban
2025-12-01
safety
2512.02282v1
Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks
Haowei Fu, Bo Ni, Han Xu, Kunpeng Liu, Dan Lin, Tyler Derr
2025-12-01
safety
2512.03100v1
GRASP: Guided Residual Adapters with Sample-wise Partitioning
Felix Nützel, Mischa Dombrowski, Bernhard Kainz
2025-12-01
2512.01675v1
The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search
Rongzhe Wei, Peizhi Niu, Xinjie Shen, Tony Tu, Yifan Li, Ruihan Wu, Eli Chien, Pin-Yu Chen, Olgica Milenkovic, Pan Li
2025-12-01
red teaming
2512.01353v2
A Wolf in Sheep's Clothing: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search
Rongzhe Wei, Peizhi Niu, Xinjie Shen, Tony Tu, Yifan Li, Ruihan Wu, Eli Chien, Olgica Milenkovic, Pan Li
2025-12-01
red teaming
2512.01353v1
Securing Large Language Models (LLMs) from Prompt Injection Attacks
Omar Farooq Khan Suri, John McCrae
2025-12-01
red teaming
2512.01326v1
DefenSee: Dissecting Threat from Sight and Text - A Multi-View Defensive Pipeline for Multi-modal Jailbreaks
Zihao Wang, Kar Wai Fok, Vrizlynn L. L. Thing
2025-12-01
2512.01185v1
‹
1
2
3
...
31
32
33
›