5.23 Confirmatory vs. exploratory analysis

Model selection is the process of deciding which predictors should be included in a model and in what form each should be included. This decision depends on the goals of the analysis and priority should be placed on subject-matter knowledge. How you go about this process depends on the purpose of your regression analysis.

If your goal is to confirm a pre-specified hypothesis, then you are doing a confirmatory analysis. Pre-specify a hypothesized model based on subject-matter knowledge – the outcome, predictors, and predictor interactions. In a confirmatory analysis, do not remove terms based on lack of statistical significance, although it is reasonable to alter the form of variables (e.g., a transformation, collapsing a sparse predictor) based on meeting regression assumptions and remove or combine predictors in order to reduce collinearity. Regression p-values are computed assuming you have a pre-specified model. To be able to make strong, confirmatory conclusions, you cannot arrive at the model after making decisions about the form of the model based on the relationships between the predictors and the outcome.

If, instead, your goal is to explore the data to try to find the best fitting model (the set of predictors that best explain the outcome) then you are doing an exploratory analysis, also known as hypothesis generating research. In an exploratory analysis, both which predictors to include and what form they take are flexible, and basing decisions on statistical significance is acceptable, as long as you do not later act as if significance tests were based on a pre-specified model. Inferences (e.g., confidence intervals, p-values) from an exploratory analysis that used significance of association with the outcome to determine which predictors to include in the model are not valid unless the decision-making process is considered (e.g., by using bootstrap resampling where the model selection process is systematically carried out within each resample) (Harrell 2015). Stepwise regression methods, for example, are exploratory methods; see, for example, help(package = "olsrr") (requires installation of the olsrr library) (Hebbali 2024).

References

Harrell, Frank E, Jr. 2015. Regression Modeling Strategies. 2nd ed. Switzerland: Springer International Publishing.
Hebbali, Aravind. 2024. Olsrr: Tools for Building OLS Regression Models. https://olsrr.rsquaredacademy.com/.