## 5.12 Which to use when? `car::Anova()`

, `anova()`

, `gmodels::estimable()`

, and `predict()`

It is easy get `car::Anova()`

, `anova()`

,
`gmodels::estimable()`

, and `predict()`

mixed up. This section summarizes when to use each, followed by some examples.

In general, `car::Anova()`

and `anova()`

are used to test whether one or more regression coefficients are equal to 0, whereas `gmodels::estimable()`

and `predict()`

are used to estimate a sum of regression coefficients each multiplied by some number.

Use ** car::Anova(, type = 3)** to test whether all the adjusted regression coefficients associated with a single term in the model (e.g., a single predictor, a single interaction) are simultaneously zero. For binary categorical predictors, continuous predictors, and interactions between them, the

`car::Anova()`

output is redundant with the regression coefficient table output from `summary()`

since each such term only has one corresponding regression coefficient. However, you must use `car::Anova()`

to test the significance of a categorical predictor with more than two levels or of an interaction that involves a categorical predictor with more than two levels. `car::Anova()`

carries out comparisons for each of a set of specific pairs of nested models – each comparison is between the full model and a reduced model with one predictor (or interaction) removed.Use ** anova()** to compare two

*nested*models – a full model compared to a reduced model with some terms removed from the full model. The tests possible with

`car::Anova()`

are also possible with `anova()`

, but the reverse is not true. For example, you must use `anova()`

to test the significance of a term involved in an interaction by comparing the full model to the reduced model that lacks both that term’s main effect and interaction. When you *can*use

`car::Anova()`

, however, it is often more convenient than `anova()`

because it carries out more than one test at the same time, one for each term in the model (comparing the full model to the reduced model that lacks that term).Use ** gmodels::estimable()** to estimate and test the significance of a linear combination of coefficients. For example, use

`gmodels::estimable()`

to estimate the effect of one predictor in an interaction at a specific level of the other predictor (since this effect is the sum of a main effect and an interaction term).Use ** predict()** to estimate the mean outcome at specified levels of the predictors. Since a prediction is a linear combination of coefficients, you could use

`gmodels::estimable()`

, but for this purpose `predict()`

is simpler to use since you do not have to remember the order of the terms or which levels are reference levels.Both prediction (using `predict()`

) and estimating an effect (using `gmodels::estimable()`

) involve multiplying regression coefficients by numbers and then adding them up. However, there are a few differences. When using `predict()`

, you supply values for every predictor in the model and you do not supply a value for the intercept. The intercept is always assumed to be included, otherwise the result would not be a prediction. When using `gmodels::estimable()`

, however, you supply numbers by which to multiply the regression coefficients. Typically, a zero is supplied for the intercept but not necessarily. Technically, you could use `gmodels::estimable()`

to compute any prediction by supplying numbers corresponding to the values of predictors and 1 for the intercept. However, the reverse it not true – there are some effects which are not estimable using a single call to `predict()`

.

**Examples**

- Test the significance of smoking status (a factor with 3 levels) in the model for fasting glucose that includes waist circumference, smoking status, age, gender, race/ethnicity, and income (
`fit.ex5.1`

).

```
## Anova Table (Type III tests)
##
## Response: LBDGLUSI
## Sum Sq Df F value Pr(>F)
## (Intercept) 142 1 62.30 0.0000000000000091 ***
## BMXWAIST 146 1 64.02 0.0000000000000040 ***
## smoker 7 2 1.44 0.23638
## RIDAGEYR 131 1 57.42 0.0000000000000926 ***
## RIAGENDR 22 1 9.73 0.00188 **
## race_eth 39 3 5.68 0.00075 ***
## income 1 2 0.23 0.79082
## Residuals 1929 846
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
# Same test using anova()
# Fit model without smoker
fit0 <- lm(LBDGLUSI ~ BMXWAIST + RIDAGEYR +
RIAGENDR + race_eth + income,
data = nhanesf.complete)
# Compare reduced and full models
anova(fit0, fit.ex5.1)
```

```
## Analysis of Variance Table
##
## Model 1: LBDGLUSI ~ BMXWAIST + RIDAGEYR + RIAGENDR + race_eth + income
## Model 2: LBDGLUSI ~ BMXWAIST + smoker + RIDAGEYR + RIAGENDR + race_eth +
## income
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 848 1936
## 2 846 1929 2 6.59 1.44 0.24
```

Although `anova()`

gives the same test in this case, the `car::Anova()`

code is more succinct and provides tests for all the other terms, as well.

- Test the significance of the interaction (which has 2 levels) in the model for fasting glucose that includes gender, income, and their interaction (
`fit.ex5.3.int`

) (Section 5.9.10).

```
## Anova Table (Type III tests)
##
## Response: LBDGLUSI
## Sum Sq Df F value Pr(>F)
## (Intercept) 2603 1 977.55 <0.0000000000000002 ***
## RIAGENDR 2 1 0.67 0.413
## income 1 2 0.11 0.898
## RIAGENDR:income 17 2 3.17 0.042 *
## Residuals 2266 851
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
# Same test using anova()
# Fit model without the interaction
fit0 <- lm(LBDGLUSI ~ RIAGENDR + income,
data = nhanesf.complete)
# Compare reduced and full models
anova(fit0, fit.ex5.3.int)
```

```
## Analysis of Variance Table
##
## Model 1: LBDGLUSI ~ RIAGENDR + income
## Model 2: LBDGLUSI ~ RIAGENDR + income + RIAGENDR:income
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 853 2283
## 2 851 2266 2 16.9 3.17 0.042 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

- Test the significance of income in
`fit.ex5.3.int`

by comparing the full model to the reduced model that lacks income and the interaction (Section 5.9.11). Is it possible to do this test using`car::Anova()`

?

```
# Fit model without the main effect or interaction
fit0 <- lm(LBDGLUSI ~ RIAGENDR, data = nhanesf.complete)
# Compare reduced and full models
anova(fit0, fit.ex5.3.int)
```

```
## Analysis of Variance Table
##
## Model 1: LBDGLUSI ~ RIAGENDR
## Model 2: LBDGLUSI ~ RIAGENDR + income + RIAGENDR:income
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 855 2295
## 2 851 2266 4 29.6 2.78 0.026 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The above test is not possible using `car::Anova()`

since the reduced model has two terms removed, `income`

and `RIAGENDR:income`

, that are not simply all the levels of a single `factor`

variable.

- In
`fit.ex5.3.int`

, estimate the gender effect among those with income $25,000 to < $55,000 (Section 5.9.7 and 5.9.9.2). Is it possible to use a single call to`predict()`

to estimate this effect?

`## [1] 6`

```
## Estimate
## (Intercept) 6.2328
## RIAGENDRFemale 0.2158
## income$25,000 to < $55,000 0.1007
## income$55,000+ 0.1021
## RIAGENDRFemale:income$25,000 to < $55,000 -0.7122
## RIAGENDRFemale:income$55,000+ -0.7418
```

```
# Estimate the effect
gmodels::estimable(fit.ex5.3.int,
c("RIAGENDRFemale" = 1,
"RIAGENDRFemale:income$25,000 to < $55,000" = 1),
conf.int = 0.95)
```

```
## Estimate Std. Error t value DF Pr(>|t|) Lower.CI Upper.CI
## (0 1 0 0 1 0) -0.4964 0.2197 -2.259 851 0.02412 -0.9276 -0.06513
```

You cannot use a single call to `predict()`

to estimate this effect because there is no way to exclude the intercept. To estimate the gender effect at the specified level of income, you would need two calls to `predict()`

, as shown below. The first call estimates the mean for females at the middle level of income and the second for males. Subtracting these two provides the correct gender effect (females vs. males). However, the subtraction results in the wrong CI since the difference of CIs is not equal to the CI of the difference.

```
# Estimated mean outcome for females at middle income level
Y1 <- predict(fit.ex5.3.int,
data.frame(RIAGENDR = "Female",
income = "$25,000 to < $55,000"),
interval = "confidence")
# Estimated mean outcome for males at middle income level
Y2 <- predict(fit.ex5.3.int,
data.frame(RIAGENDR = "Male",
income = "$25,000 to < $55,000"),
interval = "confidence")
# Correct effect estimate but NOT the correct CI
Y1 - Y2
```

```
## fit lwr upr
## 1 -0.4964 -0.4839 -0.5088
```

- Estimate the mean fasting glucose among individuals with a waist circumference of 110 cm who are current smokers, age 50 years, male, non-Hispanic white, and have an income < $25,000.
in
`fit.ex5.1`

(Section 5.10).

```
predict(fit.ex5.1,
data.frame(BMXWAIST = 110,
smoker = "Current",
RIDAGEYR = 50,
RIAGENDR = "Male",
race_eth = "Non-Hispanic White",
income = "< $25,000"),
interval = "confidence")
```

```
## fit lwr upr
## 1 6.511 6.156 6.866
```

```
# Same result using gmodels::estimable()
gmodels::estimable(fit.ex5.1,
c("(Intercept)" = 1,
"BMXWAIST" = 110,
"smokerPast" = 0,
"smokerCurrent" = 1,
"RIDAGEYR" = 50,
"RIAGENDRFemale" = 0,
"race_ethNon-Hispanic White" = 1,
"race_ethNon-Hispanic Black" = 0,
"race_ethNon-Hispanic Other" = 0,
"income$25,000 to < $55,000" = 0,
"income$55,000+" = 0),
conf.int = 0.95)
```

```
## Estimate Std. Error t value DF Pr(>|t|) Lower.CI Upper.CI
## (1 110 0 1 50 0 1 0 0 0 0) 6.511 0.1809 36 846 0 6.156 6.866
```

Although `gmodels::estimable()`

can provide the correct answer, the code required is more cumbersome. Additionally, the output includes a test of the null hypothesis that the prediction is 0, a test which is not generally of interest.