4.6 Predictions from the model

The term prediction is used here to refer to the process of making a best guess for the outcome at a given predictor value. Technically, this is estimation of the mean outcome \(E(Y|X=x)\). As we will see later, whether you are estimating the mean outcome or predicting the outcome for an individual, the point estimate will be the same; however, the interval estimate will differ.

For a model with a continuous predictor, each point on the regression line is the estimated mean outcome at a given value of the predictor \((E(Y|X))\). For a categorical predictor, the estimates are the mean outcome values at the levels of the predictor. To compute these estimates, you could manually enter various predictor values into the model and compute the result, as in the following examples.

Example 4.1 (continued):

round(summary(fit.ex4.1)$coef, 4)

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   3.3041     0.2950  11.200        0
## BMXWAIST      0.0278     0.0029   9.588        0

The estimated mean fasting glucose for those with a waist circumference of 100 cm is approximately 3.3041 + 0.0278 \(\times\) 100 = 6.0841 mmol/L.

Example 4.2 (continued):

round(summary(fit.ex4.2)$coef, 4)

##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)     5.9423     0.0666  89.229   0.0000
## smokerPast      0.4199     0.1190   3.529   0.0004
## smokerCurrent   0.2551     0.1442   1.769   0.0772

The estimated mean fasting glucose for Current smokers is 5.9423 + 0.4199 \(\times\) 0 + 0.2551 \(\times\) 1 = 6.1974 mmol/L.

However, neither of the above are exact because we rounded each coefficient. Better to let R do the computation for you by using the predict() function. Repeat the examples above, this time using predict(), along with interval = "confidence" which will compute a 95% CI for each estimated mean outcome. In each case, use the newdata argument to supply a data.frame with the value at which we want a prediction.

# Example 4.1
predict(fit.ex4.1,
        newdata = data.frame(BMXWAIST = 100),
        interval = "confidence")

##    fit   lwr   upr
## 1 6.08 5.984 6.177

# Example 4.2
predict(fit.ex4.2,
        newdata = data.frame(smoker = "Current"),
        interval = "confidence")

##     fit   lwr   upr
## 1 6.197 5.946 6.448

The manual calculation may turn out to be spot on; but this will not always be true due to rounding. If you want an exact answer, and a CI, use predict().