Document Type
Article
Publication Date
2024
DOI
10.1007/s44163-024-00200-w
Publication Title
Discover Artificial Intelligence
Volume
4
Issue
1
Pages
90 (1-14)
Abstract
Uncertainty quantification approaches have been more critical in large language models (LLMs), particularly high-risk applications requiring reliable outputs. However, traditional methods for uncertainty quantification, such as probabilistic models and ensemble techniques, face challenges when applied to the complex and high-dimensional nature of LLM-generated outputs. This study proposes a novel geometric approach to uncertainty quantification using convex hull analysis. The proposed method leverages the spatial properties of response embeddings to measure the dispersion and variability of model outputs. The prompts are categorized into three types, i.e., ’easy’, ’moderate’, and ’confusing’, to generate multiple responses using different LLMs at varying temperature settings. The responses are transformed into high-dimensional embeddings via a BERT model and subsequently projected into a two-dimensional space using Principal Component Analysis (PCA), Isomap, Multidimensional Scaling (MDS). The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is utilized to cluster the embeddings and compute the convex hull for each selected cluster. The experimental results indicate that the uncertainty of the model for LLMs depends on the prompt complexity, the model, and the temperature setting.
Rights
© 2024 The Authors.
This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original authors and the source, provide a link to the Creative Commons license, and indicate if you modified the licensed material. You do not have permission under this license to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Data Availability
Article states: "Data cannot be shared openly but are available on request from authors."
Original Publication Citation
Catak, F. O., & Kuzlu, M. (2024). Uncertainty quantification in large language models through convex hull analysis. Discover Artificial Intelligence, 4(1), 1-14, Article 90. https://doi.org/10.1007/s44163-024-00200-w
ORCID
0000-0002-8719-2353 (Kuzlu)
Repository Citation
Catak, Ferhat Ozgur and Kuzlu, Murat, "Uncertainty Quantification in Large Language Models Through Convex Hull Analysis" (2024). Engineering Technology Faculty Publications. 246.
https://digitalcommons.odu.edu/engtech_fac_pubs/246