Abstract
Prompt injection attacks, ranked the number-one vulnerability in AI systems by OWASP's 2025 Top 10 for Large Language Model Applications, remain largely unsolved, and this survey examines why. As large language models (LLMs) are deployed across enterprise workflows, agentic systems, and consumer tools, their fundamental inability to distinguish trusted instructions from untrusted user data has created a persistent and expanding attack surface. This paper presents a structured taxonomy of prompt injection attack vectors, including direct injection, indirect injection, multimodal attacks, tool and agent exploitation, hybrid chained techniques, and autonomous propagating threats. These vectors are mapped across five impact categories (data exfiltration, unauthorized actions, content manipulation, system compromise, and supply chain propagation) and evaluated against detection approaches spanning heuristic methods, machine learning classification, perplexity analysis, canary tokens, and LLM-as-judge architectures. Prevention strategies are analyzed at the model, application, and system levels. Drawing on 18 primary sources from 2023 to 2025, the central finding is architectural: transformer-based LLMs treat all context window content as equally authoritative, a structural flaw that drives the entire threat landscape and resists easy fixes. The survey closes with recommendations for layered, adaptive defense strategies and identifies the most critical open problems for future research in LLM security.
Faculty Advisor/Mentor
Yan Lu
Document Type
Paper
Disciplines
Artificial Intelligence and Robotics | Cybersecurity | Information Security
DOI
10.25776/mvhf-w867
Publication Date
4-17-2026
Upload File
wf_yes
Included in
Artificial Intelligence and Robotics Commons, Cybersecurity Commons, Information Security Commons
Hijacking the Prompt: A Survey of Prompt Injection Attacks, Detection, and Defense in Large Language Models
Prompt injection attacks, ranked the number-one vulnerability in AI systems by OWASP's 2025 Top 10 for Large Language Model Applications, remain largely unsolved, and this survey examines why. As large language models (LLMs) are deployed across enterprise workflows, agentic systems, and consumer tools, their fundamental inability to distinguish trusted instructions from untrusted user data has created a persistent and expanding attack surface. This paper presents a structured taxonomy of prompt injection attack vectors, including direct injection, indirect injection, multimodal attacks, tool and agent exploitation, hybrid chained techniques, and autonomous propagating threats. These vectors are mapped across five impact categories (data exfiltration, unauthorized actions, content manipulation, system compromise, and supply chain propagation) and evaluated against detection approaches spanning heuristic methods, machine learning classification, perplexity analysis, canary tokens, and LLM-as-judge architectures. Prevention strategies are analyzed at the model, application, and system levels. Drawing on 18 primary sources from 2023 to 2025, the central finding is architectural: transformer-based LLMs treat all context window content as equally authoritative, a structural flaw that drives the entire threat landscape and resists easy fixes. The survey closes with recommendations for layered, adaptive defense strategies and identifies the most critical open problems for future research in LLM security.