## 5.25 Sensitivity analysis

A **sensitivity analysis** compares your conclusions between the analysis you carried out and another analysis in which you change some aspect of the approach. This method can be used to assess the sensitivity of your regression results (e.g., parameter estimates, 95% confidence intervals, p-values) to changes in your approach. Even in a confirmatory analysis, where you must pre-specify the approach, you can use a sensitivity analysis to assess what would have happened had you used a different approach.

When reporting the results of a sensitivity analysis, think about how your conclusions differ between approaches. Your results could differ **quantitatively** and/or **qualitatively**. A quantitative difference affects the strength of conclusions but may or may not affect the nature of the conclusions themselves. For example, suppose a regression coefficient estimate meaningfully differs in magnitude between two approaches, but is meaningfully large and in the same direction in both. This would be a quantitative difference, but not a qualitative difference.

A qualitative difference affects the nature of the conclusions. For example, when comparing two approaches, suppose an association changes in direction or changes from meaningfully large to close to no association or vice versa. These are qualitative differences. A change in statistical significance is also a qualitative difference, in that it affects conclusions based on a strict p-value cutoff, but since the typical .05 cutoff for statistical significance is arbitrary a change in significance really does not matter as much as changes in the parameter estimates themselves. For example, if two analyses yield a regression coefficient that is approximately the same magnitude, but in one case p = .049 and in the other p = .051, then really nothing has changed despite the fact that the former is “statistically significant” and the latter is not. Although some may insist on making much of this difference, there really is no meaningful difference.

In summary, report the nature of your sensitivity analysis (what you altered and why), summarize quantitative differences, comment on qualitative differences, and combine this information into a judgment of how sensitive your original analysis is to changes in the approach. Ideally, you will be able to report “we carried out a sensitivity analysis and our results did not meaningfully change and our conclusions remained the same.” If, however, the results do differ meaningfully, then you may need to report both sets of results and note that it is not clear which better reflects reality.

To demonstrate, this section will assess sensitivity to:

- The choice of how to collapse a categorical predictor into fewer levels; and
- The presence of outliers and influential observations.

**NOTES:**

- When carrying out a sensitivity analysis, be careful to identify any changes that come along for the ride. For example, if you compare the inclusion of confounder \(X\) with, instead, the inclusion confounder \(Z\), but \(X\) and \(Z\) have different amounts of missing data, then differences in results could be due to the change in confounder and/or the change in sample size. To isolate the effect of changing the confounder, use the same sample for both analyses, one that has no missing values for either \(X\) or \(Z\).
- When removing outliers and/or influential observations, the sample size will always decrease. Thus, standard errors, width of confidence intervals, and p-values will always change just due to a reduction in sample size. However, typically we are only removing a few observations relative to the full sample size, so this will not make a large difference. Regardless, as always, pay more attention to changes in the magnitude of effects than to changes in p-values.
- When removing any observation, the characteristics of the remaining observations may change. For example, an observation that was an outlier or influential may no longer be, or vice versa. A thorough analysis of sensitivity to outliers and/or influential observations would entail removing observations one at a time, assessing the effects on the model, and reassessing the remaining observations. DFBetas already tell us what happens to each regression coefficient when each observation is removed one at a time, but they do not tell us how the influence measures themselves (DFBetas, Cook’s distance) change for the remaining observations.
- Be careful when assessing differences in coefficient magnitude between analyses that are on different scales.
- If your sensitivity analysis involves changing the scale of a
*predictor*, then the regression coefficient for that predictor will be on a different scale in the two analyses (for example, if you assess sensitivity to choice of predictor transformation). - If your sensitivity analysis involves changing the scale of the
*outcome*, then*all*the regression coefficients in the two analyses will be on different scales (for example, if you assess sensitivity to choice of outcome transformation).

- If your sensitivity analysis involves changing the scale of a

### 5.25.1 Example: Sensitivity to collapsing a categorical predictor

**Example 5.1 (continued):** Our final model (`fit.ex5.1.trans`

) included `race_eth`

(race/ethnicity) which was derived by collapsing `RIDRETH3`

into fewer categories due to sparsity. Carry out a sensitivity analysis to assess how robust are the final conclusions about the primary predictors (waist circumference and smoking status) to this approach. Re-fit the model including `RIDRETH3`

instead of `race_eth`

and compare the results to the original model.

```
# Original model
fit.ex5.1.trans <- lm(LBDGLUSI_trans ~ BMXWAIST + smoker +
RIDAGEYR + RIAGENDR + race_eth + income,
data = nhanesf.complete)
# Alternative model
fit.ex5.1.trans_b <- lm(LBDGLUSI_trans ~ BMXWAIST + smoker +
RIDAGEYR + RIAGENDR + RIDRETH3 + income,
data = nhanesf.complete)
```

`car::compareCoefs()`

(Fox, Weisberg, and Price 2023; Fox and Weisberg 2019) provides a side-by-side comparison of the regression coefficients and their standard errors.

```
# Results only shown for waist circumference and smoking status
## Calls:
## 1: lm(formula = LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR + RIAGENDR +
## race_eth + income, data = nhanesf.complete)
## 2: lm(formula = LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR + RIAGENDR +
## RIDRETH3 + income, data = nhanesf.complete)
##
## Model 1 Model 2
##
## BMXWAIST 0.0003047 0.0003117
## SE 0.0000313 0.0000315
## Pr(>|z|) < 0.0000000000000002 < 0.0000000000000002
##
## smokerPast 0.00184 0.00216
## SE 0.00128 0.00129
## Pr(>|z|) 0.1498 0.0922
##
## smokerCurrent -0.0001127 0.0000915
## SE 0.0015381 0.0015372
## Pr(>|z|) 0.9416 0.9525
```

Looking at these results, we conclude that collapsing the race/ethnicity variable did not meaningfully change the magnitude (or precision) of the estimated regression coefficients for our primary predictors (waist circumference and smoking status). To see the impact on the overall multiple degree of freedom test of significance of `smoker`

, use `car::Anova()`

on each model to see that the conclusion is not sensitive to how we collapsed race/ethnicity compared to not collapsing.

### 5.25.2 Example: Sensitivity to outliers and influential observations

**Example 5.1 (continued):** For our final model (`fit.ex5.1.trans`

), we identified 3 outliers (Section 5.21) and a number of potentially influential observations (Section 5.22). Carry out a sensitivity analysis to assess how robust are the final conclusions about the primary predictors (waist circumference and smoking status) to the presence of these observations. Re-fit the model after excluding these observations and compare the results to the original model.

As mentioned in the **NOTE** above, a more thorough sensitivity analysis would proceed by removing observations one at a time, assessing the effects on the model, and reassessing the remaining observations. In this example, we simply remove them all at once to illustrate the process of identifying and removing observations and assessing the results.

**Identifying the outliers**

Recall that when we carried out the outlier test in Section 5.21 we created a logical vector that identified these observations. This is reproduced here.

```
# Compute Studentized residuals
RSTUDENT <- rstudent(fit.ex5.1.trans)
# Use numeric cutoff from outlier test to identify outliers
SUB.OUTLIERS <- RSTUDENT < -5.38
RSTUDENT[SUB.OUTLIERS]
```

```
## 1816 66 66.1
## -11.708 -5.384 -5.384
```

**Identifying the influential observations**

Recall that when we used influence diagnostics in Section 5.22, we identified a few influential observations using Figures 5.49 and 5.50. The strategy for identifying these observations in the dataset is to compute the Cook’s distances and DFBetas and create logical vectors using numeric cutoffs.

**NOTES:**

- For Cook’s distance, use a cutoff based on looking at Figure 5.49 and choosing a value above which there are observations that stick out well above the rest (which is subjective). For DFBetas, use a cutoff of 0.2 for each regression coefficient.
- For Cook’s distance, each observation has one value. For DFBetas, each observation has one value for each regression coefficient. Use
`apply()`

and`any()`

below to identify rows for which*any*DFBeta is larger (in absolute value) than 0.2.

```
COOK <- cooks.distance(fit.ex5.1.trans)
DFBETAS <- dfbetas(fit.ex5.1.trans)
# Subjective cutoff from figure
SUB.COOK <- COOK > 0.15
# Standard cutoff for DFBeta
SUB.DFBETAS <- apply(abs(DFBETAS) > 0.2, 1, any)
# View the extreme Cook's distance values and compare
# to plot to make sure you captured all you wanted to capture
COOK[SUB.COOK]
```

```
## 1816
## 0.2377
```

The following illustrates how to, instead, identify observations with large DFBetas for a single term in the regression (rather than for any term as was done above). Figure 5.50 illustrated that there was an observation with a DFBeta less than -1 for the “Non-Hispanic Other” indicator variable of `race_eth`

. The following code identifies that single observation.

```
SUB.DFBETA.NHO <- DFBETAS[, "race_ethNon-Hispanic Other"] < -1
DFBETAS[SUB.DFBETA.NHO, "race_ethNon-Hispanic Other"]
```

`## [1] -1.057`

**Putting it all together**

In this example, we are removing all the outliers and influential observations all at once. If you were removing them one at a time, then just set `SUB`

below to be the logical vector that identifies the single observation you are removing at this step.

`## [1] 21`

Next, fit the model without these observations and compare the results before vs. after. Make sure to include the negation operator `!`

before the logical vector `SUB`

so as to include only observations that are *not* outliers or influential.

`## [1] 857`

`## [1] 836`

```
fit.ex5.1.trans_sens <- lm(LBDGLUSI_trans ~ BMXWAIST + smoker +
RIDAGEYR + RIAGENDR + race_eth + income,
data = subdat)
```

```
# Results only shown for waist circumference and smoking status
## Calls:
## 1: lm(formula = LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR + RIAGENDR +
## race_eth + income, data = nhanesf.complete)
## 2: lm(formula = LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR + RIAGENDR +
## race_eth + income, data = subdat)
##
## Model 1 Model 2
##
## BMXWAIST 0.0003047 0.0002929
## SE 0.0000313 0.0000254
## Pr(>|z|) < 0.0000000000000002 < 0.0000000000000002
##
## smokerPast 0.00184 0.00249
## SE 0.00128 0.00102
## Pr(>|z|) 0.1498 0.0146
##
## smokerCurrent -0.000113 -0.001259
## SE 0.001538 0.001242
## Pr(>|z|) 0.9416 0.3108
```

```
# Results only shown for smoking status
## Anova Table (Type III tests)
## Response: LBDGLUSI_trans
## Sum Sq Df F value Pr(>F)
## smoker 0.000547 2 1.1509 0.316851
## Anova Table (Type III tests)
## Response: LBDGLUSI_trans
## Sum Sq Df F value Pr(>F)
## smoker 0.001332 2 4.4632 0.0118 *
```

We find that the results for waist circumference have not changed meaningfully, but the results for smoking status have, both quantitatively and qualitatively. In particular, the size of the comparison between Past and Never smokers has increased from 0.0018 to 0.0025. Additionally, the p-value for this comparison dropped from non-significant (p = .150) to significant (p = .015), and the overall p-value for smoking status changed from not even close to significant (p = .317) to well below .05 (p = .012). While we still should focus on effect sizes rather than p-values, these changes are notably large.

### References

*An R Companion to Applied Regression*. 3rd ed. Los Angeles: Sage Publications, Inc.

*Car: Companion to Applied Regression*. https://r-forge.r-project.org/projects/car/.