Chapter 1 Introduction

Regression, in general, is a useful tool for understanding the association between one variable (the outcome, or dependent variable) and a set of other variables (the predictors, or independent variables, or explanatory variables). Some users of regression may treat it like a “black box” – a method that can be used without any need to understand the inner workings. They might identify an outcome and a set of predictors, throw them all into a regression model as-is, and see what results the computer outputs. Using regression in this way can lead to incorrect conclusions. It is important to understanding the model, the meaning of the terms in the model, and how to properly fit, check, and interpret the results.

Knowing the basics of regression may give a researcher a false sense of confidence in their results. If you know you have no idea how to use regression, you will seek help. If you know the basics but do not realize that there is a lot more, you may unknowingly produce incorrect or misleading results. The goal of this text is to introduce you to the various important concepts of regression analyses and the tools available in R. Until you have a lot of practice using regression, it is wise to consult with a statistician who can guide you and give you feedback; but you will be able to carry out the analyses yourself, in many cases, using the tools you learn here. After working your way through this text, you will know a lot more than you did, you will know your limits, and you will have the foundation to expand those limits over time with experience and further study.

The text assumes a level of statistical knowledge obtainable in a typical introductory statistics course. Briefly, the aspects of regression covered are fitting the model, visualizing aspects of the model fit, testing assumptions, possible modifications when the assumptions are not met, testing associations between predictors and the outcome, and interpreting and presenting the results.

The text begins with some preliminaries: an overview of regression in general (Chapter 2) and how to examine and summarize the data (Chapter 3). Following are chapters describing in detail how to carry out specific regression methods: simple (Chapter 4) and multiple (Chapter 5) linear regression; binary, ordinal, and conditional logistic regression, and log-binomial regression (Chapter 6); and Cox proportional hazards regression (survival analysis) (Chapter 7). The final two chapters cover handling data arising from a complex survey design (Chapter 8) and multiple imputation of missing data (Chapter 9), methods that can be applied in conjunction with all these regression methods. Each chapter closes with a comprehensive set of exercises.