Abstract

Large Language Models (LLMs) are increasingly applied across business, education, and cybersecurity domains. However, LLMs can yield varied outputs for the same query due to differences in architecture, training data, and response generation mechanisms. This paper examines model variability and uncertainty by comparing the responses of three LLMs—ChatGPT-4o, Gemini 2.0 Flash, and DeepSeek-V3--to a query on ranking practical intrusion detection systems (IDS). The analysis highlights key similarities and differences in the models’ outputs, offering insight into their respective reasoning and consistency.

Faculty Advisor/Mentor

Claude Turner

Document Type

Paper

Disciplines

Digital Communications and Networking

DOI

10.25776/9b5z-r173

Publication Date

4-10-2025

Upload File

wf_yes

Share

COinS
 

Why do Different LLMs Give Different Answers to the Same Question? Model Uncertainty and Variability in LLM-based Intrusion Detection Systems Ranking

Large Language Models (LLMs) are increasingly applied across business, education, and cybersecurity domains. However, LLMs can yield varied outputs for the same query due to differences in architecture, training data, and response generation mechanisms. This paper examines model variability and uncertainty by comparing the responses of three LLMs—ChatGPT-4o, Gemini 2.0 Flash, and DeepSeek-V3--to a query on ranking practical intrusion detection systems (IDS). The analysis highlights key similarities and differences in the models’ outputs, offering insight into their respective reasoning and consistency.