Philipp Zimmermann
← Back to Newsletter

Paper Library

Collection of AI Security research papers

Showing 947 papers total

January 19 - January 25, 2026

11 papers

Attributing and Exploiting Safety Vectors through Global Optimization in Large Language Models

Fengheng Chu, Jiahao Chen, Yuhong Wang, Jun Wang, Zhihui Fu, Shouling Ji, Songze Li
2026-01-22
red teaming
2601.15801v1

Improving Methodologies for LLM Evaluations Across Global Languages

Akriti Vij, Benjamin Chua, Darshini Ramiah, En Qi Ng, Mahran Morsidi, Naga Nikshith Gangarapu, Sharmini Johnson, Vanessa Wilfred, Vikneswaran Kumaran, Wan Sie Lee, Wenzhuo Yang, Yongsen Zheng, Bill Black, Boming Xia, Frank Sun, Hao Zhang, Qinghua Lu, Suyu Ma, Yue Liu, Chi-kiu Lo, Fatemeh Azadi, Isar Nejadgholi, Sowmya Vajjala, Agnes Delaborde, Nicolas Rolin, Tom Seimandi, Akiko Murakami, Haruto Ishi, Satoshi Sekine, Takayuki Semitsu, Tasuku Sasaki, Angela Kinuthia, Jean Wangari, Michael Michie, Stephanie Kasaon, Hankyul Baek, Jaewon Noh, Kihyuk Nam, Sang Seo, Sungpil Shin, Taewhi Lee, Yongsu Kim, Daisy Newbold-Harrop, Jessica Wang, Mahmoud Ghanem, Vy Hong
2026-01-22
2601.15706v1

Beyond Visual Safety: Jailbreaking Multimodal Large Language Models for Harmful Image Generation via Semantic-Agnostic Inputs

Mingyu Yu, Lana Liu, Zhehao Zhao, Wei Wang, Sujuan Qin
2026-01-22
red teaming
2601.15698v1

Securing LLM-as-a-Service for Small Businesses: An Industry Case Study of a Distributed Chatbot Deployment Platform

Jiazhu Xie, Bowen Li, Heyu Fu, Chong Gao, Ziqi Xu, Fengling Han
2026-01-21
2601.15528v1

LLM Security and Safety: Insights from Homotopy-Inspired Prompt Obfuscation

Luis Lazo, Hamed Jelodar, Roozbeh Razavi-Far
2026-01-20
safety
2601.14528v1

RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models

Rishit Chugh
2026-01-20
red teaming
2601.15331v1

PINA: Prompt Injection Attack against Navigation Agents

Jiani Liu, Yixin He, Lanlan Fan, Qidi Zhong, Yushi Cheng, Meng Zhang, Yanjiao Chen, Wenyuan Xu
2026-01-20
red teaming
2601.13612v1

Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection

Asen Dotsinski, Panagiotis Eustratiadis
2026-01-19
red teaming
2601.13359v1

Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

Diego Gosmar, Deborah A. Dahl
2026-01-19
2601.13186v1

MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction

Wenqi Zhang, Yulin Shen, Changyue Jiang, Jiarun Dai, Geng Hong, Xudong Pan
2026-01-19
2601.12822v1

Ethical Risks in Deploying Large Language Models: An Evaluation of Medical Ethics Jailbreaking

Chutian Huang, Dake Cao, Jiacheng Ji, Yunlou Fan, Chengze Yan, Hanhui Xu
2026-01-19
red teaming
2601.12652v1

January 12 - January 18, 2026

13 papers

Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents

Arunkumar V, Gangadharan G. R., Rajkumar Buyya
2026-01-18
2601.12560v1

TrojanPraise: Jailbreak LLMs via Benign Fine-Tuning

Zhixin Xie, Xurui Song, Jun Luo
2026-01-18
red teaming
2601.12460v1

AgenTRIM: Tool Risk Mitigation for Agentic AI

Roy Betser, Shamik Bose, Amit Giloni, Chiara Picardi, Sindhu Padakandla, Roman Vainshtein
2026-01-18
2601.12449v1

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

Anirudh Sekar, Mrinal Agarwal, Rachel Sharma, Akitsugu Tanaka, Jasmine Zhang, Arjun Damerla, Kevin Zhu
2026-01-18
red teaming safety
2601.12359v1

DriveSafe: A Hierarchical Risk Taxonomy for Safety-Critical LLM-Based Driving Assistants

Abhishek Kumar, Riya Tapwal, Carsten Maple
2026-01-17
safety
2601.12138v1

Preserving Fairness and Safety in Quantized LLMs Through Critical Weight Protection

Muhammad Alif Al Hakim, Alfan Farizki Wicaksono, Fajri Koto
2026-01-17
safety
2601.12033v1

Faithfulness vs. Safety: Evaluating LLM Behavior Under Counterfactual Medical Evidence

Kaijie Mo, Siddhartha Venkatayogi, Chantal Shaib, Ramez Kouzy, Wei Xu, Byron C. Wallace, Junyi Jessy Li
2026-01-17
safety
2601.11886v1

Guardrails for trust, safety, and ethical development and deployment of Large Language Models (LLM)

Anjanava Biswas, Wrick Talukdar
2026-01-16
safety
2601.14298v1

Building Production-Ready Probes For Gemini

János Kramár, Joshua Engels, Zheng Wang, Bilal Chughtai, Rohin Shah, Neel Nanda, Arthur Conmy
2026-01-16
red teaming
2601.11516v1

Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance Graphs

Marcantonio Bracale Syrnikov, Federico Pierucci, Marcello Galisai, Matteo Prandi, Piercosma Bisconti, Francesco Giarrusso, Olga Sorokoletova, Vincenzo Suriani, Daniele Nardi
2026-01-16
governance
2601.11369v2

Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance Graphs

Marcantonio Bracale Syrnikov, Federico Pierucci, Marcello Galisai, Matteo Prandi, Piercosma Bisconti, Francesco Giarrusso, Olga Sorokoletova, Vincenzo Suriani, Daniele Nardi
2026-01-16
governance
2601.11369v1

SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation

Aiman Al Masoud, Marco Arazzi, Antonino Nocera
2026-01-16
2601.11199v1

AJAR: Adaptive Jailbreak Architecture for Red-teaming

Yipu Dou, Wang Yang
2026-01-16
red teaming
2601.10971v1