A.4 COVID-19 county-level data


The COVID Severity Forecasting data include county-level demographics and risk factors related to COVID-19 (Altieri et al. 2021). The USA Facts case, death, and county population data include county-level COVID-19 case and death counts, as well as population sizes (Source: USAFacts).


Data documentation is available at COVID Severity Forecasting data and USA Facts case, death, and county population data. The COVID Severity Forecasting data are distributed under the MIT License. The USA Facts data are distributed under a Creative Commons Attribution-ShareAlike 4.0 (or higher) International Public License (the “CC BY-SA 4.0 License”). See How to Cite USAFacts and Terms and Conditions for more information. Data derived herein from USA Facts are made available under the same license as the original USA Facts data.

Teaching Dataset

Any analyses, interpretations, or conclusions reached herein are are only for the purpose of illustrating regression methods and are credited to the author, not to the licensors. The author makes no claim or implication that any inferences derived from this teaching dataset are valid estimates.

The dataset covid_20210908_rmph.rData was created by merging the COVID Severity Forecasting data with USA Facts case, death, and county population data, excluding counties with FIPS that did not match between the two datasets. The COVID Severity Forecasting data were downloaded September 9, 2021, and the USA Facts data were downloaded September 10, 2021 and contained data collected through September 8, 2021.

After merging, the following code was used to derive additional variables (no need to run, just shown here for your information).

covid <- covid %>%
  mutate(State              = factor(State),
         Rural.UrbanContinuumCode2013 = factor(Rural.UrbanContinuumCode2013),
         hospitals_per_100k = 100000*X.Hospitals / PopulationEstimate2018,
         icu_beds_per_100k  = 100000*X.ICU_beds  / PopulationEstimate2018,
         icu_beds_per_hosp  =        X.ICU_beds  / X.Hospitals,
         fte_hosp_per_100k  = 100000*X.FTEHospitalTotal2017 / 
         fte_hosp_per_hosp  =        X.FTEHospitalTotal2017 / X.Hospitals,
         mds_per_100k       = 100000*TotalM.D..s.TotNon.FedandFed2017 / 
         mds_per_hosp       =        TotalM.D..s.TotNon.FedandFed2017 / 
         fte_hosp_per_hosp  = na_if(fte_hosp_per_hosp, Inf),
         mds_per_hosp       = na_if(mds_per_hosp,      Inf))

Additionally, labels were added to the variables. To view the labels, run the following code.


Creating the Teaching Dataset

To create the teaching dataset, do the following.

  • Download covid_20210908_rmph.rData from RMPH Resources.
  • Place this .Rdata file in your “Data” folder.

Rows and columns

This file has the following numbers of rows and columns:

## [1] 3074   99


Altieri, Nick, Rebecca L Barter, James Duncan, Raaz Dwivedi, Karl Kumbier, Xiao Li, Robert Netzorg, et al. 2021. “Curating a COVID-19 Data Repository and Forecasting County-Level Death Counts in the United States.” Harvard Data Science Review (Special Issue 1). https://doi.org/10.1162/99608f92.1d4e0dae.