The National Health and Nutrition Examination Survey (NHANES) is a survey designed to “assess the health and nutritional status of adults and children in the United States.” The study began in the 1960s and since 1999 has has been conducted in 2-year cycles (e.g, 1999-2000, 2001-2002). This nationally representative survey includes both interviews and physical examinations. NHANES is conducted by the National Center for Health Statistics (NCHS), which is part of the Centers for Disease Control and Prevention (CDC) (National Center for Health Statistics 2021a, 2021b).
A description of the 2017-2018 survey cycle target population, objectives, and data collection procedures can be found at NHANES 2017-2018 Overview. See, in particular, the section on “Guidance for NHANES Data Users.” NHANES data includes demographics, chronic conditions, and risk factors. See NHANES 2017-2018 Data Details for a full list of datasets and their documentation. Individual datasets (e.g., Demographics, Body Measures, Cholesterol – Total) are freely downloadable and can be merged (within a cycle) on the variable SEQN. Data from different survey cycles are from different individuals, however, not longitudinal. See also the NCHS Data User Agreement.
Any analyses, interpretations, or conclusions reached herein are are only for the purpose of illustrating regression methods and are credited to the author, not to NCHS, which is responsible only for the initial data. The author makes no claim or implication that any inferences derived from these teaching datasets are valid estimates.
The teaching datasets used in this text were merged from multiple NHANES 2017-2018 data files and include a random subset of 1000 observations from adults in the examination data (
nhanes1718_adult_exam_sub_rmph.Rdata), and a random subset of 1000 observations from adults in the fasting subsample (
nhanes1718_adult_fast_sub_rmph.Rdata). In each case, sampling was done with replacement using the appropriate subsample weights in order to approximate a nationally representative distribution. This sampling method is solely for the purpose of creating a teaching dataset to illustrate regression methods. Chapter 8 discusses analyzing data using the survey weights appropriately using the full dataset (
In these NHANES teaching datasets, variable names in CAPS are coded as in the original dataset (with the exception of missing value codes being set to NA and some cases assigned values based on skip patterns). Variable names in lower case were derived from other variables (e.g.,
Creating the Teaching Datasets
To create the teaching datasets, do the following.
- Download the R script file
NHANES 2017 2018 Process.Rfrom RMPH Resources.
- Run the R script file
NHANES 2017 2018 Process.Rto download and process the raw NHANES data. There is no need to download the NHANES data directly from NCHS as it will be downloaded automatically when you run the script.
- The script will create the following teaching datasets:
nhanes_CC_rmph.Rdata(an artificial matched case-control dataset used to illustrate conditional logistic regression)
nhanesf.complete.50_rmph.Rdata(a subsample of size 50 used for a small sample size example)
nhanesf.complete.30_rmph.Rdata(a subsample of size 30 used for a small sample size exercise)
- Place these
.Rdatafiles in your “Data” folder.
Rows and columns
These files have the following numbers of rows and columns:
##  9254 90
##  1000 85
##  1000 86
##  890 6
##  50 12
##  30 4