Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 770 papers total

May 26 - June 01, 2025

12 papers

TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis

Xiaorui Wu, Xiaofeng Mao, Fei Li, Xin Zhang, Xuanhong Li, Chong Teng, Donghong Ji, Zhuang Li
2025-05-30
red teaming
2505.24672v1

Benchmarking Large Language Models for Cryptanalysis and Mismatched-Generalization

Utsav Maskey, Chencheng Zhu, Usman Naseem
2025-05-30
red teaming
2505.24621v1

AMIA: Automatic Masking and Joint Intention Analysis Makes LVLMs Robust Jailbreak Defenders

Yuqi Zhang, Yuchun Miao, Zuchao Li, Liang Ding
2025-05-30
2505.24519v1

Model Unlearning via Sparse Autoencoder Subspace Guided Projections

Xu Wang, Zihao Li, Benyou Wang, Yan Hu, Difan Zou
2025-05-30
2505.24428v1

From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models

Haibo Jin, Peiyan Zhang, Peiran Wang, Man Luo, Haohan Wang
2025-05-30
red teaming
2505.24232v1

Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap

Wenhan Yang, Spencer Stice, Ali Payani, Baharan Mirzasoleiman
2025-05-30
safety
2505.24208v1

The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It

Zheng-Xin Yong, Beyza Ermis, Marzieh Fadaee, Stephen H. Bach, Julia Kreutzer
2025-05-30
safety
2505.24119v1

Understanding Refusal in Language Models with Sparse Autoencoders

Wei Jie Yeo, Nirmalendu Prakash, Clement Neo, Roy Ka-Wei Lee, Erik Cambria, Ranjan Satapathy
2025-05-29
red teaming
2505.23556v1

Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models

Mingyu Yu, Wei Wang, Yanjie Wei, Sujuan Qin
2025-05-29
red teaming
2505.23404v1

Operationalizing CaMeL: Strengthening LLM Defenses for Enterprise Deployment

Krti Tallam, Emma Miller
2025-05-28
safety
2505.22852v1

GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance

Zaixi Zhang, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang
2025-05-28
2505.23839v1

JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models

Jiaxin Song, Yixu Wang, Jie Li, Rui Yu, Yan Teng, Xingjun Ma, Yingchun Wang
2025-05-26
red teaming
2505.19610v2

May 19 - May 25, 2025

4 papers

May 12 - May 18, 2025

2 papers

May 05 - May 11, 2025

1 paper

April 28 - May 04, 2025

3 papers

April 21 - April 27, 2025

2 papers