ORCID
0009-0004-8759-6720 (Nelson, Houston), 0009-0003-1139-4121 (Beauchamp), 0000-0002-9760-7639 (Pace)
Document Type
Article
Publication Date
2025
DOI
10.7759/cureus.86543
Publication Title
Cureus
Volume
17
Issue
6
Pages
e86543
Abstract
Background: The internet has become a primary source of health information for the public, with important implications for patient decision-making and public health outcomes. However, the quality and readability of this content vary widely. With the rise of generative artificial intelligence (AI) tools such as ChatGPT and Gemini, new challenges and opportunities have emerged in how patients access and interpret medical information.
Objective: To evaluate and compare the quality, credibility, and readability of consumer health information provided by traditional search engines (Google, Bing) and generative AI platforms (ChatGPT, Gemini) using three validated instruments: DISCERN, JAMA Benchmark Criteria, and Flesch-Kincaid Readability Metrics.
Methods: Twenty health-related webpages from each platform were collected using a standardized query across Google, Bing, Gemini, and ChatGPT. Each source was assessed independently by two reviewers using the DISCERN instrument and the adapted JAMA benchmark criteria. Readability was evaluated using the Flesch Reading Ease and Grade Level scores. One-way ANOVA with Bonferroni correction was used to compare platform performance, and Cohen's Kappa measured inter-rater reliability.
Results: Google achieved the highest mean scores for both quality and credibility (DISCERN: 3.33 ± 0.53; JAMA: 3.70 ± 0.44), followed by Bing, Gemini, and ChatGPT. ChatGPT received the lowest scores across all quality measures. Readability analysis revealed no statistically significant differences between platforms; however, all content exceeded recommended reading levels for public health information. Cohen's Kappa indicated strong inter-rater agreement across DISCERN items.
Conclusion: Google remains the most reliable source of high-quality, readable health information among the evaluated platforms. Generative AI tools such as ChatGPT and Gemini, while increasingly popular, exhibited significant limitations in accuracy, transparency, and complexity. These findings highlight the need for improved oversight, transparency, and user education regarding AI-generated health content.
Rights
© Copyright 2025 Nelson et al.
This is an open access article distributed under the terms of the Creative Commons Attribution License CC-BY 4.0., which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Original Publication Citation
Nelson, H. C., Beauchamp, M. T., & Pace, A. A. (2025). The reliability gap: How traditional search engines outperform artificial intelligence (AI) chatbots in rosacea public health information quality. Cureus, 17(6), Article e86543. https://doi.org/10.7759/cureus.86543
Repository Citation
Nelson, H. C., Beauchamp, M. T., & Pace, A. A. (2025). The reliability gap: How traditional search engines outperform artificial intelligence (AI) chatbots in rosacea public health information quality. Cureus, 17(6), Article e86543. https://doi.org/10.7759/cureus.86543
Included in
Artificial Intelligence and Robotics Commons, Health Information Technology Commons, Public Health Commons