2.2 Why use regression?
Regression can be used for any or all of the following purposes:
Testing a theory: A theory that implies a certain functional relationship between \(Y\) and \(X\) can be tested by comparing the hypothesized model to simpler or more complex models and seeing which model fits best.
Prediction: After fitting the model to observed data, specified values of \(X\) can be used to predict yet to be observed values of \(Y\).
Machine learning: The fields of artificial intelligence and machine learning (AI/ML), sometimes placed under the umbrella of “data science”, “data mining”, “analytics”, or “statistical learning” are, essentially, attempts to predict an outcome based on a set of predictors (“features”). Regression is one method among many of making such a prediction. For a gentle introduction to these methods in R, see James et al. (2021). For a more in depth treatment, see Hastie, Tibshirani, and Friedman (2016).
Testing an association: Is there a significant association between \(Y\) and \(X\)? In the case of simple linear regression, this question is answered by testing the null hypothesis \(H_0:\beta_1=0\). Under the null hypothesis, the outcome does not depend on the predictor. If there is enough evidence to reject the null hypothesis, we conclude that there is a significant association.
Estimating a rate of change: How does \(Y\) change as \(X\) changes? In the case of simple linear regression, this question is answered by estimating the magnitude of \(\beta_1\), the regression slope.
Controlling for confounding: Is there an association between \(Y\) and \(X_1\) after adjusting for \(X_2, \ldots, X_K\)? When the data arise from an observational study, an observed association between a single predictor and the outcome may be spurious due to confounding. A third variable may actually be associated with each, and those associations induce an association between the predictor of interest and the outcome. Alternatively, the estimate of a real association may be biased due to confounding. Regression adjustment for confounding is a powerful tool for attempting to isolate the effect of one predictor on the outcome from the effects of other potentially confounding variables.