← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 1331 papers total
March 30 - April 05, 2026
12 papers
KAIJU: An Executive Kernel for Intent-Gated Execution of LLM Agents
Cormac Guerin, Frank Guerin
2026-03-31
2604.02375v1
A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation
Ha Na Cho
2026-03-31
safety
2604.00249v1
EnsembleSHAP: Faithful and Certifiably Robust Attribution for Random Subspace Method
Yanting Wang, Jinyuan Jia
2026-03-31
red teaming
2603.30034v1
Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks
Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh
2026-03-31
red teaming
2603.30016v1
Performative Scenario Optimization
Quanyan Zhu, Zhengye Han
2026-03-31
red teaming
2603.29982v1
Adversarial Prompt Injection Attack on Multimodal Large Language Models
Meiwen Ding, Song Xia, Chenqi Kong, Xudong Jiang
2026-03-31
red teaming
2603.29418v1
Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning
Bilgehan Sel, Xuanli He, Alwin Peng, Ming Jin, Jerry Wei
2026-03-30
red teaming
2603.29038v1
Crossing the NL/PL Divide: Information Flow Analysis Across the NL/PL Boundary in LLM-Integrated Code
Zihao Xu, Xiao Cheng, Ruijie Meng, Yuekang Li
2026-03-30
2603.28345v1
Evaluating Privilege Usage of Agents on Real-World Tools
Quan Zhang, Lianhang Fu, Lvsi Lian, Gwihwan Go, Yujue Wang, Chijin Zhou, Yu Jiang, Geguang Pu
2026-03-30
red teaming
2603.28166v1
Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers
Haochuan Kevin Wang
2026-03-30
red teaming
2603.28013v2
Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers
Haochuan Kevin Wang
2026-03-30
red teaming
2603.28013v1
Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey
Bhavuk Jain, Sercan Ö. Arık, Hardeo K. Thakur
2026-03-30
red teaming
2603.27918v1
March 23 - March 29, 2026
12 papers
Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models
Duanyi Yao, Changyue Li, Zhicong Huang, Cheng Hong, Songze Li
2026-03-29
red teaming
2603.27522v1
A Systematic Taxonomy of Security Vulnerabilities in the OpenClaw AI Agent Framework
Surada Suwansathit, Yuxuan Zhang, Guofei Gu
2026-03-29
red teaming
2603.27517v1
GUARD-SLM: Token Activation-Based Defense Against Jailbreak Attacks for Small Language Models
Md Jueal Mia, Joaquin Molto, Yanzhao Wu, M. Hadi Amini
2026-03-28
red teaming
2603.28817v1
From Inference Routing to Agent Orchestration: Declarative Policy Compilation with Cross-Layer Verification
Huamin Chen, Xunzhuo Liu, Bowei He, Xue Liu
2026-03-28
2603.27299v1
SafeClaw-R: Towards Safe and Secure Multi-Agent Personal Assistants
Haoyu Wang, Zibo Xiao, Yedi Zhang, Christopher M. Poskitt, Jun Sun
2026-03-28
2603.28807v1
Prompt Attack Detection with LLM-as-a-Judge and Mixture-of-Models
Hieu Xuan Le, Benjamin Goh, Quy Anh Tang
2026-03-26
red teaming
2603.25176v1
PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems
Haozhen Wang, Haoyue Liu, Jionghao Zhu, Zhichao Wang, Yongxin Guo, Xiaoying Tang
2026-03-26
red teaming
2603.25164v1
AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective
Zhenyi Wang, Siyu Luan
2026-03-25
red teaming
2603.24857v1
Analysing the Safety Pitfalls of Steering Vectors
Yuxiao Li, Alina Fastowski, Efstratios Zaradoukas, Bardh Prenkaj, Gjergji Kasneci
2026-03-25
red teaming
2603.24543v1
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, Maksym Andriushchenko
2026-03-25
red teaming
2603.24511v1
Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search
Yulin Shen, Xudong Pan, Geng Hong, Min Yang
2026-03-25
red teaming
2603.24203v1
The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense
Qianlong Lan, Anuj Kaul
2026-03-24
2603.23791v1
‹
1
2
3
4
...
54
55
56
›