5.26 Writing it up

This section demonstrates how to write up the regression methods and results for Example 5.1. Below are a few lines of code to extract the information we need for our write-up.

Outcome, predictors, and dataset name:

## lm(formula = LBDGLUSI_trans ~ BMXWAIST + smoker + RIDAGEYR + 
##     RIAGENDR + race_eth + income, data = nhanesf.complete)

Number of observations: If we used a complete case analysis then the number of rows in the dataset (nrow(nhanesf.complete)) is the number of cases used in the analysis. But counting the number of residuals to get the sample size will work even when you have missing data.

## [1] 857

Selected descriptives: Extract any descriptive summaries you might want to add to your write-up. For example, the age range of the individuals in the analysis.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    20.0    34.0    47.0    47.8    61.0    80.0

Summary of the model: Regression coefficients, p-values, and R2. The p-values shown here test the association with the outcome for continuous predictors and categorical predictors with exactly 2 levels; for categorical predictors, each p-value tests the difference in mean outcome between the level shown in that row and the reference level.

# (results not shown)

Confidence intervals for regression coefficients:

# (results not shown)

Predictor p-values: For continuous predictors and categorical predictors with exactly 2 levels, the Type III test p-values are the same as those in the regression coefficient table; for each categorical predictor with more than 2 levels, the Type III test provides the multiple degree of freedom p-value for the overall test of that predictor’s association with the outcome.

car::Anova(fit.ex5.1.trans, type = 3)
# (results not shown)

The above will also provide a test of significance of any interactions present in the model. If there is an interaction in the model, however, the main effect p-value for each predictor involved in the interaction is not the overall p-value for that predictor. To get an overall test of a predictor involved in an interaction, see Section 5.9.11. To get tests of a predictor involved in an interaction at specific levels of the other predictor in the interaction, see Section 5.9.7.


When writing up the Methods, describe the data, including the data source, sample size, and inclusion/exclusion criteria, if any. For continuous predictors, specify the units. For categorical predictors, list the levels and specify the reference level. Describe all the steps you took to process the data, fit the model, and diagnose the fit of the model, including any sensitivity analyses you carried out. Describe what steps you took to resolve any issues uncovered by regression diagnostics (e.g., a variable transformation).

We used data from a teaching dataset including 857 NHANES 2017-2018 participants aged 20 to 80 years (mean 47.8 years) to assess the association between fasting glucose (mmol/L) and each of waist circumference (cm) and smoking status (Never (reference), Past, Current) using multiple linear regression adjusted for confounding due to age (years), gender (Male (reference), Female), race/ethnicity (Hispanic (reference), non-Hispanic White, non-Hispanic Black, non-Hispanic other), and annual household income (< $25,000 (reference), $25,000 to < $55,000, $55,000+). We checked the model assumptions of normality, linearity, and constant variance. Some issues were found but after fasting glucose was transformed as \(-Y^{-1.5}\) (based on a Box-Cox transformation) these issues were sufficiently resolved. We checked for collinearity between the predictors and found no issues (all VIFs were near 1) (you can verify this yourself using car::vif(fit.ex5.1.trans)). We also checked for outliers and influential observations and evaluated the robustness of our results to such observations using a sensitivity analysis.

Results: The final model explained 23.6% of the variation in (transformed) fasting glucose. Waist circumference was positively associated with (transformed) fasting glucose (B = 0.00030; 95% CI = 0.00024, 0.00037; p < .001). Smoking status was not significantly associated with (transformed) fasting glucose (p = .317), although this conclusion is sensitive to the presence of outliers and influential observations. After removing 21 potential outliers and influential observations, the magnitude of the smoking status effect comparing Past vs. Never smokers was much larger and statistically significant. Age, gender, and race/ethnicity were also significantly associated with the outcome.

See Table 5.5 for full regression results.

NOBS <- length(fit.ex5.1.trans$residuals)
fit.ex5.1.trans %>%
  tbl_regression(intercept = T,
                 estimate_fun = function(x) style_sigfig(x, digits = 5),
                 pvalue_fun   = function(x) style_pvalue(x, digits = 3),
                 label  = list(BMXWAIST ~ "Waist Circumference (cm)",
                               smoker   ~ "Smoking Status",
                               RIDAGEYR ~ "Age (years)",
                               RIAGENDR ~ "Gender",
                               race_eth ~ "Race/ethnicity",
                               income   ~ "Annual Income")) %>%
  add_global_p(keep = T) %>% 
  modify_caption(paste("Linear regression results for (transformed) fasting glucose
  (-mmol/L^(-1.5)) vs. waist circumference (cm) (N = ", NOBS, ")", sep=""))
Table 5.5: Linear regression results for (transformed) fasting glucose (-mmol/L^(-1.5)) vs. waist circumference (cm) (N = 857)
Characteristic Beta 95% CI1 p-value
(Intercept) -0.11266 -0.12021, -0.10511 <0.001
Waist Circumference (cm) 0.00030 0.00024, 0.00037 <0.001
Smoking Status 0.317
    Past 0.00184 -0.00067, 0.00435 0.150
    Current -0.00011 -0.00313, 0.00291 0.942
Age (years) 0.00033 0.00026, 0.00040 <0.001
Gender <0.001
    Female -0.00473 -0.00684, -0.00263 <0.001
Race/ethnicity 0.009
    Non-Hispanic White -0.00457 -0.00748, -0.00166 0.002
    Non-Hispanic Black -0.00268 -0.00667, 0.00132 0.189
    Non-Hispanic Other -0.00069 -0.00502, 0.00364 0.755
Annual Income 0.846
    < $25,000
    $25,000 to < $55,000 0.00062 -0.00264, 0.00389 0.707
    $55,000+ -0.00010 -0.00305, 0.00285 0.945
1 CI = Confidence Interval

One downside to the use of an outcome transformation is that the regression coefficients are no longer on the original scale. In Section 5.5.2 we interpreted the coefficients of the model before applying a transformation to the outcome. The interpretations are similar here, except now the outcome is on a different scale.