← Back to Library

Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks

Authors: Toqeer Ali Syed, Mishal Ateeq Almutairi, Mahmoud Abdel Moaty

Published: 2025-12-29

arXiv ID: 2512.23557v1

Added to Library: 2026-01-07 10:06 UTC

📄 Abstract

Powerful autonomous systems, which reason, plan, and converse using and between numerous tools and agents, are made possible by Large Language Models (LLMs), Vision-Language Models (VLMs), and new agentic AI systems, like LangChain and GraphChain. Nevertheless, this agentic environment increases the probability of the occurrence of multimodal prompt injection (PI) attacks, in which concealed or malicious instructions carried in text, pictures, metadata, or agent-to-agent messages may spread throughout the graph and lead to unintended behavior, a breach of policy, or corruption of state. In order to mitigate these risks, this paper suggests a Cross-Agent Multimodal Provenanc- Aware Defense Framework whereby all the prompts, either user-generated or produced by upstream agents, are sanitized and all the outputs generated by an LLM are verified independently before being sent to downstream nodes. This framework contains a Text sanitizer agent, visual sanitizer agent, and output validator agent all coordinated by a provenance ledger, which keeps metadata of modality, source, and trust level throughout the entire agent network. This architecture makes sure that agent-to-agent communication abides by clear trust frames such such that injected instructions are not propagated down LangChain or GraphChain-style-workflows. The experimental assessments show that multimodal injection detection accuracy is significantly enhanced, and the cross-agent trust leakage is minimized, as well as, agentic execution pathways become stable. The framework, which expands the concept of provenance tracking and validation to the multi-agent orchestration, enhances the establishment of secure, understandable and reliable agentic AI systems.

🔍 Key Points

  • The study demonstrates the vulnerability of Large Language Models (LLMs) to document-level hidden prompt injection attacks, highlighting significant impacts on academic review scores and acceptance decisions.
  • A comprehensive dataset of 500 real academic papers was constructed for evaluation, showcasing the effects of hidden prompts injected in multiple languages with notable differences in susceptibility among these languages.
  • Results indicated substantial adverse effects on reviews for English, Japanese, and Chinese injections, while Arabic injections showed minimal impact, revealing critical implications for multilingual robustness in LLM-based applications.
  • The analysis features precise experimental metrics, including score drift, Injection Success Rate (ISR), and transitions in acceptance outcomes, contributing to a robust assessment of the risks posed by these attacks.
  • The paper advocates for further research on multilingual vulnerability and mitigation strategies, underscoring the immediate relevance of understanding these risks in high-stakes academic peer review scenarios.

💡 Why This Paper Matters

This paper is crucial as it illuminates a significant security threat to LLMs utilized in critical processes such as academic peer review. The findings underscore the necessity of addressing the vulnerabilities inherent in LLMs, particularly in multilingual contexts, where the impact of adversarial inputs varies greatly. By revealing how hidden prompt injections can skew review outcomes, it calls for urgent attention to develop safeguards to ensure the reliability and integrity of automated decision-support systems in academia and beyond.

🎯 Why It's Interesting for AI Security Researchers

AI security researchers would find this paper of great interest as it addresses the emergent threat of prompt injection attacks on LLMs, a significant concern given the increasing reliance on these models in critical workflows. The research not only examines this vulnerability in a novel multilingual context but also provides empirical evidence of how these injections can alter decisions in high-stakes scenarios. The insights gained from this study can inform the development of more robust frameworks and countermeasures to protect against adversarial manipulations in AI systems.

📚 Read the Full Paper