Detecting Prompt Injection Attacks Against Application Using Classifiers

Authors: Safwan Shaheer, G. M. Refatul Islam, Mohammad Rafid Hamid, Md. Abrar Faiaz Khan, Md. Omar Faruk, Yaseen Nur

Published: 2025-12-14

arXiv ID: 2512.12583v1

Added to Library: 2026-01-07 10:12 UTC

Red Teaming

📄 Abstract

Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection dataset based on the HackAPrompt Playground Submissions corpus and trains several classifiers, including LSTM, feed forward neural networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems.

🔍 Key Points

Development of a comprehensive dataset for prompt injection attacks, using and augmenting the HackAPrompt Playground Submissions corpus.
Training of multiple classifiers, including advanced models like LSTM and traditional ones like Random Forest and Naive Bayes, to detect prompt injection attacks.
Comparison of model performance, establishing Random Forest as the most effective for detecting malicious prompts with high precision and recall.
Proposing practical mitigation strategies that can be implemented alongside existing LLM systems to filter potentially malicious user inputs.
Identifying areas for future research, including hybrid models and adversarial robustness, to further enhance detection capabilities.

💡 Why This Paper Matters

This paper is significant as it not only addresses a growing vulnerability in AI systems—prompt injection attacks—but also provides a structured methodology for detecting these attacks through machine learning. The creation and curation of a relevant dataset, coupled with experimentation across various classifier models, represent a robust contribution to the field of AI security. The proposed practical strategies for mitigation further highlight its importance for real-world applications.

🎯 Why It's Interesting for AI Security Researchers

This paper is particularly relevant to AI security researchers as it tackles a crucial aspect of safeguarding LLM-integrated applications. With the increasing deployment of AI systems across various sectors, understanding and mitigating security threats like prompt injection is vital. The findings can inform the development of more secure AI frameworks and contribute to the discourse on responsible AI deployment.

Detecting Prompt Injection Attacks Against Application Using Classifiers

📄 Abstract

🔍 Key Points

💡 Why This Paper Matters

🎯 Why It's Interesting for AI Security Researchers

📚 Read the Full Paper