# Preface

This text was written to be used in a second biostatistics course for Master of Public Health students, however students in any field will find it useful. Students in many disciplines take an introductory statistics course, providing foundational competencies but perhaps not enough to use more advanced methods without additional training. There are a plethora of textbooks covering topics such as linear regression, logistic regression, and survival analysis aimed at those with a background in mathematical statistics and/or without a focus specifically on public health and/or without a focus on using R statistical software. The goal of this text is to provide a gentle introduction to regression methods, using R, that covers all the basics and a bit more, with examples drawn from public health data.

The text began in 2016 as course notes and evolved over time into what you see here. My hope is that what you learn from this book will give you the knowledge and skills to understand and carry out appropriate basic regression analyses, as well as the foundation and confidence to go deeper. When you are ready to go deeper, there are excellent texts that cover each of the methods covered herein, as well as R programming, in much greater detail (e.g., Faraway 2016; Fox 2015; Fox and Weisberg 2019; Harrell 2015; Klein and Moeschberger 2010; Kleinman and Horton 2014; Lohr 2021; Lumley 2010; van Buuren 2018; Weisberg 2014; H. Wickham, Çetinkaya-Rundel, and Grolemund 2023; Hadley Wickham 2019). Additionally, improvements over standard regression methods are available using hierarchical (multilevel, random coefficient) and related shrinkage estimation procedures such as parametric empirical-Bayes/semi-Bayes and penalized-likelihood methods (Efron 2013, 2023; Greenland 2000; Harrell 2015).

**NOTE**

In the online version, references appear at the bottom of each page. However, some appear with no author, not because the author is the same as the previous author in the Reference list on *this* page, but because the author is the same as the previous author in the Reference list at the end of the book.

For example, in the Reference list on this page, *Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models* is by Julian J. Faraway, *When Should Epidemiologic Regressions Use Random Coefficients?* is by Sander Greenland, and *Applied Linear Regression* is by Sanford Weisberg. This issue does not occur in the print version as, there, the references only appear at the end. If anyone knows how to fix this issue for the HTML version, please let me know!

### References

*Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction*. Reprint. Cambridge: Cambridge University Press.

*Exponential Families in Theory and Practice*. Cambridge: Cambridge University Press.

*Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models*. 2nd ed. Boca Raton, London, New York: Chapman; Hall/CRC.

*Applied Regression Analysis and Generalized Linear Models*. 3rd ed. Los Angeles: Sage Publications, Inc.

*An R Companion to Applied Regression*. 3rd ed. Los Angeles: Sage Publications, Inc.

*Biometrics*56 (3): 915–21. https://www.jstor.org/stable/2676943.

*Regression Modeling Strategies*. 2nd ed. Switzerland: Springer International Publishing.

*Survival Analysis: Techniques for Censored and Truncated Data*. New York, NY: Springer.

*SAS and R*. 2nd ed. Boca Raton: Routledge.

*Sampling: Design and Analysis*. 3rd ed. Boca Raton: Chapman; Hall/CRC.

*Complex Surveys: A Guide to Analysis Using r: A Guide to Analysis Using r*. John Wiley; Sons.

*Flexible Imputation of Missing Data*. 2nd ed. Boca Raton: Chapman; Hall/CRC.

*Applied Linear Regression*. 4th ed. Hoboken, NJ: Wiley.

*Advanced R*. 2nd ed. Boca Raton London New York: Chapman; Hall/CRC.

*R for Data Science: Import, Tidy, Transform, Visualize, and Model Data*. 2nd ed. Sebastopol, CA: O’Reilly Media.