## 6.8 Prediction

As with MLR, `predict()`

computes predicted values from the model. The default output will be on the log-odds scale, but adding `type = "response"`

will instead give a predicted probability. Unfortunately, `predict()`

does not have a CI option for `glm()`

models. Instead, use `gmodels::estimable()`

to obtain the prediction along with its 95% CI.

**Example 6.3 (continued):** What is the predicted probability (and 95% CI) of lifetime marijuana use for someone who first used alcohol at age 13 years, is currently age 30 years, male, and has an income of < $20,000? What if they first used alcohol at age 21 years?

The following demonstrates the use of `predict()`

to compute these predicted probabilities.

```
predict(fit.ex6.3.adj,
newdata = data.frame(alc_agefirst = c(13, 21),
demog_age_cat6 = "26-34",
demog_sex = "Male",
demog_income = "Less than $20,000"),
type = "response")
```

```
## 1 2
## 0.9103 0.5283
```

However, if you also want a CI, then use `gmodels::estimable()`

. The code is slightly more complicated than using `predict()`

. First, `predict()`

automatically includes the intercept in its calculation but since `gmodels::estimable()`

is also used for estimates that are not predictions you have to explicitly tell it you want to include the intercept. Second, you must enter the names of the terms as spelled in the coefficient output rather than the variable names in the model (these are different for factor predictors). Third, you must use the inverse logit function to convert the result to a probability. Lastly, unlike predict, where we included two values for `alc_agefirst`

to get two predictions, if you want multiple predictions you must run `gmodels::estimable()`

multiple times.

First, examine the `rownames()`

of the coefficient output to make sure you spell each term correctly.

`rownames(summary(fit.ex6.3.adj)$coef)`

```
## [1] "(Intercept)" "alc_agefirst"
## [3] "demog_age_cat626-34" "demog_age_cat635-49"
## [5] "demog_age_cat650-64" "demog_age_cat665+"
## [7] "demog_sexMale" "demog_income$20,000 - $49,999"
## [9] "demog_income$50,000 - $74,999" "demog_income$75,000 or more"
```

Then enter the needed terms and their multiplicative factors in `gmodels::estimable()`

, surrounded by `ilogit()`

. Note that we only extract the estimate and the CI, not the other parts of the `gmodels::estimable()`

output as the other parts would not make sense after applying `ilogit()`

.

```
# Always include the intercept for prediction
# Specify a 1 for the intercept, a # for each continuous predictor
# and a 1 for each non-reference level of a categorical variable.
# If a predictor is at its reference level, it should be left out.
ilogit(gmodels::estimable(fit.ex6.3.adj,
c("(Intercept)" = 1,
"alc_agefirst" = 13,
"demog_age_cat626-34" = 1,
"demog_sexMale" = 1),
conf.int = 0.95))[c("Estimate", "Lower.CI", "Upper.CI")]
```

```
## Estimate Lower.CI Upper.CI
## (1 13 1 0 0 0 1 0 0 0) 0.9103 0.8387 0.9519
```

```
ilogit(gmodels::estimable(fit.ex6.3.adj,
c("(Intercept)" = 1,
"alc_agefirst" = 21,
"demog_age_cat626-34" = 1,
"demog_sexMale" = 1),
conf.int = 0.95))[c("Estimate", "Lower.CI", "Upper.CI")]
```

```
## Estimate Lower.CI Upper.CI
## (1 21 1 0 0 0 1 0 0 0) 0.5283 0.3724 0.6789
```

**Conclusion:** We predict that among individuals who first used alcohol at age 13 years, are age 30 years, male, and have an income less than $20,000 per year, 91.0% have used marijuana (95% CI = 83.9%, 95.2%). Among those who instead waited until age 21 years to first use alcohol, the proportion drops to 52.8% (95% CI = 37.2%, 67.9%).