As required by Federal law in the United States, the 2018 United States Birth Data were compiled from information on birth certificates by the National Vital Statistics System, part of the National Center for Health Statistics, in cooperation with States (National Center for Health Statistics 2022). Information collected includes gestational age, birthweight, maternal and paternal demographic information, risk factors, and characteristics of the labor and delivery.
Any analyses, interpretations, or conclusions reached herein are are only for the purpose of illustrating regression methods and are credited to the author, not to NCHS, which is responsible only for the initial data. The author makes no claim or implication that any inferences derived from these teaching datasets are valid estimates.
The teaching dataset
natality2018_rmph.Rdata is a simple random sample of 2000 births intended only for illustrating regression methods. In the teaching dataset, variable names in CAPS are coded as in the original dataset (with the exception of missing value codes being set to NA and some cases assigned values based on skip patterns). Variable names in lower case were derived from other variables. The gestational age variable
COMBGEST was modified to create the variable
gestage37 for use in survival analysis in which gestational ages > 37 weeks were censored at 37 weeks and a random subset of gestational ages were censored at times < 37 weeks. Thus, in addition to being only a small sample of U.S. births, the data are slightly modified for teaching purposes.
Creating the Teaching Datasets
To create the teaching datasets, do the following.
- Download the .zip file containing the 2018 CSV file found at Vital Statistics Natality Birth Data.
- Extract the CSV file
natl2018us.csvfrom the .zip file.
- Download the R script file
Natality 2018 Process.Rfrom RMPH Resources.
- Run the R script file
Natality 2018 Process.Rto process the raw data and create the following teaching datasets:
natality_CC_rmph.Rdata(an artificial matched case-control dataset used to illustrate conditional logistic regression)
- Place these
.Rdatafiles in your “Data” folder.
Rows and columns
These files have the following numbers of rows and columns:
##  2000 39
##  1570 4