Document Type

Article

Publication Date

2023

DOI

10.3390/forecast5010015

Publication Title

Forecasting

Volume

5

Issue

1

Pages

285-296

Abstract

Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phenomena, measures that might serve an important role as leading indicators in forecasts and nowcasts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies among probability distributions examined for random variables based upon gamma (1, 1) and Gaussian random walk distributions. Quantifying these spurious correlations and their likely magnitude for various distributions has value for several reasons. First, analysts can make progress toward accurate inference. Second, they can avoid unwarranted credulity. Third, they can demand appropriate disclosure from the study authors.

Rights

© 2023 by the authors.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Data Availability

Article states: Data and other replication materials are available at “Replication Data for: Measuring and Answering the Challenge of Spurious Correlations in Big Search Data”, https://doi.org/10.7910/DVN/UW1UYR (accessed on 26 December 2022), Harvard Dataverse.

ORCID

0000-0002-5420-0521 (Richman)

Original Publication Citation

Richman, J. T., & Roberts, R. J. (2023). Assessing spurious correlations in big search data. Forecasting, 5(1), 285-296. https://doi.org/10.3390/forecast5010015

Share

COinS