← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 1331 papers total
November 17 - November 23, 2025
4 papers
VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language
Zonghao Ying, Moyang Chen, Nizhang Li, Zhiqiang Wang, Wenxin Zhang, Quanchen Zou, Zonglei Jing, Aishan Liu, Xianglong Liu
2025-11-17
red teaming
2511.13127v1
Infinite-Story: A Training-Free Consistent Text-to-Image Generation
Jihun Park, Kyoungmin Lee, Jongmin Gim, Hyeonseo Jo, Minseok Oh, Wonhyeok Choi, Kyumin Hwang, Jaeyeul Kim, Minwoo Choi, Sunghoon Im
2025-11-17
2511.13002v1
MedRule-KG: A Knowledge-Graph--Steered Scaffold for Reliable Mathematical and Biomedical Reasoning
Crystal Su
2025-11-17
2511.12963v1
BrainNormalizer: Anatomy-Informed Pseudo-Healthy Brain Reconstruction from Tumor MRI via Edge-Guided ControlNet
Min Gu Kwak, Yeonju Lee, Hairong Wang, Jing Li
2025-11-17
2511.12853v1
November 10 - November 16, 2025
20 papers
LLM Reinforcement in Context
Thomas Rivasseau
2025-11-16
2511.12782v1
Backdoor Attacks on Open Vocabulary Object Detectors via Multi-Modal Prompt Tuning
Ankita Raj, Chetan Arora
2025-11-16
2511.12735v1
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Yunhao Chen, Xin Wang, Juncheng Li, Yixu Wang, Jie Li, Yan Teng, Yingchun Wang, Xingjun Ma
2025-11-16
red teaming
2511.12710v1
Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments
Samuel Nathanson, Rebecca Williams, Cynthia Matuszek
2025-11-16
red teaming
2511.13788v1
GRAPHTEXTACK: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs
Jiaji Ma, Puja Trivedi, Danai Koutra
2025-11-16
red teaming
2511.12423v1
Privacy-Preserving Prompt Injection Detection for LLMs Using Federated Learning and Embedding-Based NLP Classification
Hasini Jayathilaka
2025-11-15
red teaming
2511.12295v1
Prompt-Conditioned FiLM and Multi-Scale Fusion on MedSigLIP for Low-Dose CT Quality Assessment
Tolga Demiroglu, Mehmet Ozan Unal, Metin Ertas, Isa Yildirim
2025-11-15
2511.12256v1
AlignTree: Efficient Defense Against LLM Jailbreak Attacks
Gil Goren, Shahar Katz, Lior Wolf
2025-11-15
safety
2511.12217v1
NegBLEURT Forest: Leveraging Inconsistencies for Detecting Jailbreak Attacks
Lama Sleem, Jerome Francois, Lujun Li, Nathan Foucher, Niccolo Gentile, Radu State
2025-11-14
red teaming
2511.11784v1
EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment
Ruoxi Cheng, Haoxuan Ma, Teng Ma, Hongyi Zhang
2025-11-14
2511.11301v1
Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio
Guangke Chen, Yuhui Wang, Shouling Ji, Xiapu Luo, Ting Wang
2025-11-14
red teaming
2511.10913v1
ICX360: In-Context eXplainability 360 Toolkit
Dennis Wei, Ronny Luss, Xiaomeng Hu, Lucas Monteiro Paes, Pin-Yu Chen, Karthikeyan Natesan Ramamurthy, Erik Miehling, Inge Vejsbjerg, Hendrik Strobelt
2025-11-14
red teaming
2511.10879v1
Can AI Models be Jailbroken to Phish Elderly Victims? An End-to-End Evaluation
Fred Heiding, Simon Lermen
2025-11-13
red teaming
2511.11759v1
PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
Runpeng Geng, Yanting Wang, Chenlong Yin, Minhao Cheng, Ying Chen, Jinyuan Jia
2025-11-13
2511.10720v1
Say It Differently: Linguistic Styles as Jailbreak Vectors
Srikant Panda, Avinash Rai
2025-11-13
red teaming
2511.10519v1
EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models
Jialin Wu, Kecen Li, Zhicong Huang, Xinfeng Li, Xiaofeng Wang, Cheng Hong
2025-11-13
2511.09880v1
A precessing magnetic jet as the engine of GRB 250702B
Tao An
2025-11-13
2511.09850v1
Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO
Nikolay Blagoev, Oğuzhan Ersoy, Lydia Yiyu Chen
2025-11-12
red teaming
2511.09780v1
Rebellion: Noise-Robust Reasoning Training for Audio Reasoning Models
Tiansheng Huang, Virat Shejwalkar, Oscar Chang, Milad Nasr, Ling Liu
2025-11-12
red teaming
2511.09682v1
Toward Honest Language Models for Deductive Reasoning
Jiarui Liu, Kaustubh Dhole, Yingheng Wang, Haoyang Wen, Sarah Zhang, Haitao Mao, Gaotang Li, Neeraj Varshney, Jingguo Liu, Xiaoman Pan
2025-11-12
2511.09222v4
‹
1
2
3
...
26
27
28
...
54
55
56
›