Abstract
This study examines how decoding temperature affects output uncertainty in a fixed-context retrieval-augmented generation (RAG) system. We define uncertainty as the semantic dispersion among repeated answers under the same fixed retrieved context, with greater dispersion interpreted as higher uncertainty. To isolate this answer-generation variability from retrieval drift, each question was paired with a fixed retrieved context, and repeated generations differed only in temperature. The experiment used nine questions drawn from a machine-learning textbook corpus, with three questions each at easy, moderate, and hard difficulty. Each question was evaluated at five temperatures (0.0, 0.25, 0.5, 0.75, and 1.0) over 30 iterations, yielding 1,350 total runs. Uncertainty was measured by first embedding the answers, reducing them to two dimensions with PCA, clustering them with DBSCAN, and finally quantifying the spread statistics using convex-hull area and mean pairwise cosine distance. The results show that temperature increases answer uncertainty in a difficulty-sensitive manner: easy questions remained relatively bounded, moderate questions exhibited more structured dispersion, and hard questions produced the earliest and largest spread of answers. All 1,350 responses were also evaluated for correctness and groundedness by an LLM judge on a 0-1 rubric. Despite increases in model uncertainty, the mean correctness and groundedness scores remained extremely high, with scores of 0.9959 and 0.9981, respectively. These findings suggest that, under fixed retrieval conditions, higher temperature increases uncertainty in RAG outputs and that this effect depends strongly on question difficulty, without widespread correctness or grounding failures.
Faculty Advisor/Mentor
Murat Kuzlu
Document Type
Paper
Disciplines
Artificial Intelligence and Robotics
DOI
10.25776/cj7w-4s79
Publication Date
4-22-2026
Upload File
wf_yes
Included in
Temperature-Induced Uncertainty in Fixed-Context Retrieval-Augmented Generation
This study examines how decoding temperature affects output uncertainty in a fixed-context retrieval-augmented generation (RAG) system. We define uncertainty as the semantic dispersion among repeated answers under the same fixed retrieved context, with greater dispersion interpreted as higher uncertainty. To isolate this answer-generation variability from retrieval drift, each question was paired with a fixed retrieved context, and repeated generations differed only in temperature. The experiment used nine questions drawn from a machine-learning textbook corpus, with three questions each at easy, moderate, and hard difficulty. Each question was evaluated at five temperatures (0.0, 0.25, 0.5, 0.75, and 1.0) over 30 iterations, yielding 1,350 total runs. Uncertainty was measured by first embedding the answers, reducing them to two dimensions with PCA, clustering them with DBSCAN, and finally quantifying the spread statistics using convex-hull area and mean pairwise cosine distance. The results show that temperature increases answer uncertainty in a difficulty-sensitive manner: easy questions remained relatively bounded, moderate questions exhibited more structured dispersion, and hard questions produced the earliest and largest spread of answers. All 1,350 responses were also evaluated for correctness and groundedness by an LLM judge on a 0-1 rubric. Despite increases in model uncertainty, the mean correctness and groundedness scores remained extremely high, with scores of 0.9959 and 0.9981, respectively. These findings suggest that, under fixed retrieval conditions, higher temperature increases uncertainty in RAG outputs and that this effect depends strongly on question difficulty, without widespread correctness or grounding failures.