Abstract

This paper provides a comprehensive review of poisoning attacks against large language models (LLMs), drawing primarily from Fendley et al. (2025) and complementary studies from 2022–2025. It categorizes poisoning research into two key dimensions, Metrics and Specifications, to evaluate how attack success is measured and how attacks are implemented. This paper synthesizes quantitative results, experimental findings, and defense strategies across data, model, and multi-modal poisoning contexts. Finally, it highlights emerging challenges posed by self-adaptive and synthetic-data-driven LLMs, and proposes future research directions to strengthen model security and reliability.

Faculty Advisor/Mentor

Yan Lu

Document Type

Paper

Disciplines

Cybersecurity

DOI

10.25776/7avg-dc64

Publication Date

11-10-2025

Upload File

wf_yes

Included in

Cybersecurity Commons

Share

COinS
 

A Systematic Review of Poisoning Attacks Against Large Language Models (LLM)

This paper provides a comprehensive review of poisoning attacks against large language models (LLMs), drawing primarily from Fendley et al. (2025) and complementary studies from 2022–2025. It categorizes poisoning research into two key dimensions, Metrics and Specifications, to evaluate how attack success is measured and how attacks are implemented. This paper synthesizes quantitative results, experimental findings, and defense strategies across data, model, and multi-modal poisoning contexts. Finally, it highlights emerging challenges posed by self-adaptive and synthetic-data-driven LLMs, and proposes future research directions to strengthen model security and reliability.