1.3 Datasets

To seamlessly use the code in the text, create an R project (for more information, see “Workflow: scripts and projects” in R for Data Science (H. Wickham, Çetinkaya-Rundel, and Grolemund 2017), and a folder called “Data” in the same location as your R project. In that folder, place the file Functions_rmph.R provided with this text, as well as the teaching datasets used in this text.

Below is a list of the teaching datasets used in this text. Descriptions, including instructions for downloading and processing create these datasets can be found in Appendix A.

  • NHANES (2017-2018)
  • United Nations Human Development Data (2020)
  • U.S. Natality (2018)
  • COVID-19 county-level data
  • NSDUH (2019)
  • Framingham Heart Study (BioLINCC teaching dataset)
  • CAMP (BioLINCC teaching dataset)
  • Digitalis (BioLINCC teaching dataset)
  • Opioid

NOTE: The datasets are meant for teaching, not research. The analyses herein are meant solely as teaching examples illustrating the use of regression methods using R. Results found in this text, and datasets provided with this text, should not be used to draw conclusions about any conditions or relationships between variables.


Wickham, H., M. Çetinkaya-Rundel, and G. Grolemund. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd ed. Sebastopol, CA: O’Reilly Media.