## 2.2 Why use regression?

Regression can be used for any or all of the following purposes:

• Testing a theory: A theory that implies a certain functional relationship between $$Y$$ and $$X$$ can be tested by comparing the hypothesized model to simpler or more complex models and seeing which model fits best.

• Prediction: After fitting the model to observed data, specified values of $$X$$ can be used to predict yet to be observed values of $$Y$$.

• Machine learning: The fields of artificial intelligence and machine learning (AI/ML), sometimes placed under the umbrella of “data science”, “data mining”, “analytics”, or “statistical learning” are, essentially, attempts to predict an outcome based on a set of predictors (“features”). Regression is one method among many of making such a prediction. For a gentle introduction to these methods in R, see James et al. (2021). For a more in depth treatment, see Hastie, Tibshirani, and Friedman (2016).

• Testing an association: Is there a significant association between $$Y$$ and $$X$$? In the case of simple linear regression, this question is answered by testing the null hypothesis $$H_0:\beta_1=0$$. Under the null hypothesis, the outcome does not depend on the predictor. If there is enough evidence to reject the null hypothesis, we conclude that there is a significant association.

• Estimating a rate of change: How does $$Y$$ change as $$X$$ changes? In the case of simple linear regression, this question is answered by estimating the magnitude of $$\beta_1$$, the regression slope.

• Controlling for confounding: Is there an association between $$Y$$ and $$X_1$$ after adjusting for $$X_2, \ldots, X_K$$? When the data arise from an observational study, an observed association between a single predictor and the outcome may be spurious due to confounding. A third variable may actually be associated with each, and those associations induce an association between the predictor of interest and the outcome. Alternatively, the estimate of a real association may be biased due to confounding. Regression adjustment for confounding is a powerful tool for attempting to isolate the effect of one predictor on the outcome from the effects of other potentially confounding variables.

### References

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2016. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York, NY: Springer.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning: With Applications in R. 2nd ed. New York, NY: Springer.