Application of One-, Three-, and Seven-Day Forecasts During Early Onset on the COVID-19 Epidemic Dataset Using Moving Average, Autoregressive, Autoregressive Moving Average, Autoregressive Integrated Moving Average, and Naïve Forecasting Methods
Data in Brief
The coronavirus disease 2019 (COVID-19) spread rapidly across the world since its appearance in December 2019. This data set creates one-, three-, and seven-day forecasts of the COVID-19 pandemic's cumulative case counts at the county, health district, and state geographic levels for the state of Virginia. Forecasts are created over the first 46 days of reported COVID-19 cases using the cumulative case count data provided by The New York Times as of April 22, 2020. From this historical data, one-, three-, seven, and all-days prior to the forecast start date are used to generate the forecasts. Forecasts are created using: (1) a Naïve approach; (2) Holt-Winters exponential smoothing (HW); (3) growth rate (Growth); (4) moving average (MA); (5) autoregressive (AR); (6) autoregressive moving average (ARMA); and (7) autoregressive integrated moving average (ARIMA). Median Absolute Error (MdAE) and Median Absolute Percentage Error (MdAPE) metrics are created with each forecast to evaluate the forecast with respect to existing historical data. These error metrics are aggregated to provide a means for assessing which combination of forecast method, forecast length, and lookback length are best fits, based on lowest aggregated error at each geographic level. The data set is comprised of an R-Project file, four R source code files, all 1,329,404 generated short-range forecasts, MdAE and MdAPE error metric data for each forecast, copies of the input files, and the generated comparison tables. All code and data files are provided to provide transparency and facilitate replicability and reproducibility. This package opens directly in RStudio through the R Project file. The R Project file removes the need to set path locations for the folders contained within the data set to simplify setup requirements. This data set provides two avenues for reproducing results: 1) Use the provided code to generate the forecasts from scratch and then run the analyses; or 2) Load the saved forecast data and run the analyses on the stored data. Code annotations provide the instructions needed to accomplish both routes. This data can be used to generate the same set of forecasts and error metrics for any US state by altering the state parameter within the source code. Users can also generate health district forecasts for any other state, by providing a file which maps each county within a state to its respective health-district. The source code can be connected to the most up-to-date version of The New York Times COVID-19 dataset allows for the generation of forecasts up to the most recently reported data to facilitate near real-time forecasting.
Original Publication Citation
Lynch, C. J., & Gore, R. (2021). Application of one-, three-, and seven-day forecasts during early onset on the COVID-19 epidemic dataset using moving average, autoregressive, autoregressive moving average, autoregressive integrated moving average, and naïve forecasting methods. Data in Brief, 35, 12 pp., Article 106759. https://doi.org/10.1016/j.dib.2021.106759
Lynch, Christopher J. and Gore, Ross, "Application of One-, Three-, and Seven-Day Forecasts During Early Onset on the COVID-19 Epidemic Dataset Using Moving Average, Autoregressive, Autoregressive Moving Average, Autoregressive Integrated Moving Average, and Naïve Forecasting Methods" (2021). VMASC Publications. 52.