Abstract
Large Language Models (LLMs) are increasingly applied across business, education, and cybersecurity domains. However, LLMs can yield varied outputs for the same query due to differences in architecture, training data, and response generation mechanisms. This paper examines model variability and uncertainty by comparing the responses of three LLMs—ChatGPT-4o, Gemini 2.0 Flash, and DeepSeek-V3--to a query on ranking practical intrusion detection systems (IDS). The analysis highlights key similarities and differences in the models’ outputs, offering insight into their respective reasoning and consistency.
Faculty Advisor/Mentor
Claude Turner
Document Type
Paper
Disciplines
Digital Communications and Networking
DOI
10.25776/9b5z-r173
Publication Date
4-10-2025
Upload File
wf_yes
Included in
Why do Different LLMs Give Different Answers to the Same Question? Model Uncertainty and Variability in LLM-based Intrusion Detection Systems Ranking
Large Language Models (LLMs) are increasingly applied across business, education, and cybersecurity domains. However, LLMs can yield varied outputs for the same query due to differences in architecture, training data, and response generation mechanisms. This paper examines model variability and uncertainty by comparing the responses of three LLMs—ChatGPT-4o, Gemini 2.0 Flash, and DeepSeek-V3--to a query on ranking practical intrusion detection systems (IDS). The analysis highlights key similarities and differences in the models’ outputs, offering insight into their respective reasoning and consistency.