A.3 U.S. Natality (2018)


As required by Federal law in the United States, the 2018 United States Birth Data were compiled from information on birth certificates by the National Vital Statistics System, part of the National Center for Health Statistics, in cooperation with States (National Center for Health Statistics 2022). Information collected includes gestational age, birthweight, maternal and paternal demographic information, risk factors, and characteristics of the labor and delivery.


Documentation can be found in the User Guide to the 2018 Natality Public Use File. See also the NCHS Data User Agreement.

Teaching Datasets

Any analyses, interpretations, or conclusions reached herein are are only for the purpose of illustrating regression methods and are credited to the author, not to NCHS, which is responsible only for the initial data. The author makes no claim or implication that any inferences derived from these teaching datasets are valid estimates.

The teaching dataset natality2018_rmph.Rdata is a simple random sample of 2000 births intended only for illustrating regression methods. In the teaching dataset, variable names in CAPS are coded as in the original dataset (with the exception of missing value codes being set to NA and some cases assigned values based on skip patterns). Variable names in lower case were derived from other variables. The gestational age variable COMBGEST was modified to create the variable gestage37 for use in survival analysis in which gestational ages > 37 weeks were censored at 37 weeks and a random subset of gestational ages were censored at times < 37 weeks. Thus, in addition to being only a small sample of U.S. births, the data are slightly modified for teaching purposes.

Creating the Teaching Datasets

To create the teaching datasets, do the following.

  • Download the .zip file containing the 2018 CSV file found at Vital Statistics Natality Birth Data.
  • Extract the CSV file natl2018us.csv from the .zip file.
  • Download the R script file Natality 2018 Process.R from RMPH Resources.
  • Run the R script file Natality 2018 Process.R to process the raw data and create the following teaching datasets:
    • natality2018_rmph.Rdata
    • natality_CC_rmph.Rdata (an artificial matched case-control dataset used to illustrate conditional logistic regression)
  • Place these .Rdata files in your “Data” folder.

Rows and columns

These files have the following numbers of rows and columns:

## [1] 2000   39
## [1] 1570    4


———. 2022. Birth Data.” https://www.cdc.gov/nchs/nvss/births.htm.