← Back to Newsletter
Paper Library
Collection of AI Security research papers
Search papers:
Filter by topic:
All Topics
Red Teaming
Safety
Risk & Governance
🔍 Search
Showing 947 papers total
October 28 - November 03, 2024
3 papers
Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures
Victoria Benjamin, Emily Braca, Israel Carter, Hafsa Kanchwala, Nava Khojasteh, Charly Landow, Yi Luo, Caroline Ma, Anna Magarelli, Rachel Mirin, Avery Moyer, Kayla Simpson, Amelia Skawinski, Thomas Heverin
2024-10-28
red teaming
2410.23308v1
Palisade -- Prompt Injection Detection Framework
Sahasra Kokkula, Somanathan R, Nandavardhan R, Aashishkumar, G Divya
2024-10-28
2410.21146v1
Fine-tuned Large Language Models (LLMs): Improved Prompt Injection Attacks Detection
Md Abdur Rahman, Fan Wu, Alfredo Cuzzocrea, Sheikh Iqbal Ahamed
2024-10-28
red teaming
2410.21337v2
October 14 - October 20, 2024
2 papers
Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment
Zedian Shao, Hongbin Liu, Jaden Mu, Neil Zhenqiang Gong
2024-10-18
red teaming
2410.14827v3
SPIN: Self-Supervised Prompt INjection
Leon Zhou, Junfeng Yang, Chengzhi Mao
2024-10-17
red teaming
2410.13236v1
October 07 - October 13, 2024
3 papers
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems
Donghyun Lee, Mo Tiwari
2024-10-09
red teaming
2410.07283v1
SecAlign: Defending Against Prompt Injection with Preference Optimization
Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, Chuan Guo
2024-10-07
red teaming
2410.05451v3
A test suite of prompt injection attacks for LLM-based machine translation
Antonio Valerio Miceli-Barone, Zhifan Sun
2024-10-07
red teaming
2410.05047v1
September 23 - September 29, 2024
2 papers
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks
Rongchang Li, Minjie Chen, Chang Hu, Han Chen, Wenpeng Xing, Meng Han
2024-09-29
red teaming
2409.19521v1
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Jiahao Yu, Yangguang Shao, Hanwen Miao, Junzheng Shi
2024-09-23
red teaming
2409.14729v2
September 16 - September 22, 2024
2 papers
Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection
Md Abdur Rahman, Hossain Shahriar, Fan Wu, Alfredo Cuzzocrea
2024-09-20
red teaming
2409.13331v1
A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration
Zhang Zheng
2024-09-16
2409.10403v1
August 05 - August 11, 2024
1 paper
Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection
Subaru Kimura, Ryota Tanaka, Shumpei Miyawaki, Jun Suzuki, Keisuke Sakaguchi
2024-08-07
red teaming
2408.03554v1
July 29 - August 04, 2024
1 paper
PIP: Prototypes-Injected Prompt for Federated Class Incremental Learning
Muhammad Anwar Ma'sum, Mahardhika Pratama, Savitha Ramasamy, Lin Liu, Habibullah Habibullah, Ryszard Kowalczyk
2024-07-30
2407.20705v1
July 01 - July 07, 2024
1 paper
Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning
Simon Ostermann, Kevin Baum, Christoph Endres, Julia Masloh, Patrick Schramowski
2024-07-03
red teaming
2407.03391v1
May 27 - June 02, 2024
1 paper
Exfiltration of personal information from ChatGPT via prompt injection
Gregory Schwartzman
2024-05-31
red teaming
2406.00199v2
April 01 - April 07, 2024
1 paper
Goal-guided Generative Prompt Injection Attack on Large Language Models
Chong Zhang, Mingyu Jin, Qinkai Yu, Chengzhi Liu, Haochen Xue, Xiaobo Jin
2024-04-06
red teaming
2404.07234v4
March 25 - March 31, 2024
1 paper
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong
2024-03-26
red teaming
2403.17710v5
March 18 - March 24, 2024
1 paper
Defending Against Indirect Prompt Injection Attacks With Spotlighting
Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, Emre Kiciman
2024-03-20
red teaming
2403.14720v1
March 04 - March 10, 2024
1 paper
Automatic and Universal Prompt Injection Attacks against Large Language Models
Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, Chaowei Xiao
2024-03-07
red teaming
2403.04957v1
February 05 - February 11, 2024
1 paper
StruQ: Defending Against Prompt Injection with Structured Queries
Sizhe Chen, Julien Piet, Chawin Sitawarin, David Wagner
2024-02-09
2402.06363v2
January 29 - February 04, 2024
1 paper
An Early Categorization of Prompt Injection Attacks on Large Language Models
Sippo Rossi, Alisia Marianne Michel, Raghava Rao Mukkamala, Jason Bennett Thatcher
2024-01-31
red teaming
2402.00898v1
January 15 - January 21, 2024
1 paper
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications
Xuchen Suo
2024-01-15
2401.07612v1
December 25 - December 31, 2023
1 paper
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, David Wagner
2023-12-29
2312.17673v2
‹
1
2
3
...
38
39
40
›