5.11 Confidence and prediction intervals

As with SLR (Section 4.7), there are three kinds of intervals we are interested in:

  • Confidence intervals (CI) for the regression coefficients (CIs for \(\beta\)s)
  • CI for the mean outcome (CI for \(E(Y|X=x)\))
  • Prediction interval (PI) for an individual observation (PI for \(Y|X=x\))

The syntax is the same as for SLR, just with more predictors in the data.frame. Using our model from Example 5.1, and the predictor values we used for the prediction in the previous section we get the following.

# CIs for regression coefficients
confint(fit.ex5.1)
##                               2.5 %   97.5 %
## (Intercept)                 2.23373  3.71228
## BMXWAIST                    0.01850  0.03052
## smokerPast                 -0.03443  0.45685
## smokerCurrent              -0.19789  0.39342
## RIDAGEYR                    0.01856  0.03154
## RIAGENDRFemale             -0.53431 -0.12158
## race_ethNon-Hispanic White -0.79371 -0.22356
## race_ethNon-Hispanic Black -0.63860  0.14331
## race_ethNon-Hispanic Other -0.40786  0.44075
## income$25,000 to < $55,000 -0.41978  0.21903
## income$55,000+             -0.38281  0.19519
# CI for the mean outcome
predict(fit.ex5.1, data.frame(
  BMXWAIST = 130,
  smoker   = "Current",
  RIDAGEYR = 50,
  RIAGENDR = "Male",
  race_eth = "Non-Hispanic Black",
  income   = "$55,000+"),
interval = "confidence")
##     fit   lwr   upr
## 1 7.168 6.711 7.625
# PI for an individual observation
predict(fit.ex5.1, data.frame(
  BMXWAIST = 130,
  smoker   = "Current",
  RIDAGEYR = 50,
  RIAGENDR = "Male",
  race_eth = "Non-Hispanic Black",
  income   = "$55,000+"),
interval = "prediction")
##     fit  lwr   upr
## 1 7.168 4.17 10.17

As discussed in Section 4.7, the estimate of the mean outcome and the prediction for an individual are the same but the PI for an individual (the interval in which we expect 95% of observations to lie) will always be wider than the CI for the mean outcome (the interval resulting from a method that, if used on repeated samples, we expect to contain the true mean in 95% of samples). Single observations are more variable than the mean of many observations.