## 6.17 Writing it up

This section demonstrates how to write up the Methods and Results sections corresponding to a logistic regression analysis.

Example 6.3: Using our random subset of adult participants from the 2019 National Survey of Drug Use and Health (NSDUH) (Section A.5), what is the association between lifetime marijuana use (mj_lifetime) and age at first use of alcohol (alc_agefirst), adjusted for age (demog_age_cat6), sex (demog_sex), and income (demog_income)? Write up the Methods and Results for the analyses that answered this research question using models without (fit.ex6.3.adj) and with an interaction (fit.ex6.3.int).

### 6.17.1 Writing up logistic regression results for a model with no interaction

Below are a few lines of code to extract the information we need for our write-up.

Outcome, predictors, and dataset name: The call component of the glm object provides the call to glm, including the variable and dataset names.

fit.ex6.3.adj$call ## glm(formula = mj_lifetime ~ alc_agefirst + demog_age_cat6 + demog_sex + ## demog_income, family = binomial, data = nsduh) Number of observations: If we used a complete case analysis then the number of rows in the dataset (nrow(nsduh)) is the number of cases used in the analysis. But counting the number of residuals to get the sample size will work even when you have missing data. length(fit.ex6.3.adj$residuals)
##  843

Selected descriptive statistics: Extract any descriptive summaries you might want to add to your write-up. For example, the proportion of individuals in each age group. In this example, we did not remove missing data first before fitting the model, so we need to do that here before creating our summary. We want our summary to be based on the same sample as our regression model, and a regression will always carry out a complete case analysis whether you removed the cases with missing values first or not.

tmp <- nsduh %>%
demog_sex, demog_income)
round(100*prop.table(table(tmp$demog_age_cat6)), 1) ## ## 18-25 26-34 35-49 50-64 65+ ## 12.1 14.7 27.3 24.1 21.8 Regression coefficients, p-values, AORs, and 95% confidence intervals: The p-values in the regression coefficient table test the association with the outcome for continuous predictors and categorical predictors with exactly 2 levels. For categorical predictors, each p-value tests the difference in log-odds of the outcome between the level shown in that row and the reference level. # Regression coefficients, p-values, AORs, 95% CIs for AORs car::S(fit.ex6.3.adj) # 95% CIs for the regression coefficients confint(fit.ex6.3.adj) # (results not shown) Type III Wald test p-values: For each categorical predictor with more than 2 levels, the Type III Wald test provides the multiple degree of freedom p-value for the overall test of that predictor’s association with the outcome. car::Anova(fit.ex6.3.adj, type = 3, test.statistic = "Wald") # (results not shown) Methods: We used data from a teaching dataset including 843 participants in the 2019 National Survey of Drug Use and Health aged 18 years and older (18-25 = 12.1%, 26-34 = 14.7%, 35-49 = 27.3%, 50-64 = 24.1%, 65+ = 21.8%) to assess the association between lifetime marijuana use and age at first use of alcohol, adjusted for age group (ref = 18-25), sex (Female (ref), Male), and income (Less than$20,000 (ref), $20,000 to$49,999, $50,000 to$74,999, $75,000 or more) using binary logistic regression. We assessed the linearity assumption and checked for separation, collinearity, and outliers (no issues were found). We evaluated the robustness of our results to the presence of a few potentially influential observations using a sensitivity analysis (we did not carry this out in this chapter, but this demonstrates where you would report such an analysis). A forest plot is provided to illustrate the AORs and their 95% CIs. Results: After adjusting for age, sex, and income, age at first alcohol use was significantly negatively associated with lifetime marijuana use (AOR = 0.759; 95% CI = 0.718, 0.800; p < .001). Individuals who first used alcohol at a given age have 24.1% lower odds of having ever used marijuana than those who first used alcohol one year earlier. Current age was also significantly associated with the outcome, with older individuals having lower odds of lifetime marijuana use. See Table 6.2 for full regression results and Figure 6.3 for a forest plot of the AORs and their 95% CIs. The code below illustrates the use of gtsummary::tbl_merge() to merge together two regression tables. In this case, we are merging one table that includes the regression coefficients (which are on the logit scale) and another table that includes the AORs. For the intercept, we only include the regression coefficient, not an AOR, since the exponentiated intercept does not correspond to an AOR (the intercept is the logit of the probability when all other predictors are at their reference level, not a log-odds ratio). Notice how some features are included in one table but not the other to avoid redundancies (e.g., p-values only in the second table). The labels, however, must be in both tables in order for the merge to work correctly. Notice also (in the code comments) the difference between using style_sigfig(), which keeps a specified number of significant digits, and style_number(), which rounds to a specified number of decimal places. For example, the number 1.532 with two significant digits would be 1.5, while rounded to two decimal places would be 1.53. In this particular table, rounding makes everything line up nicely. In other tables, especially when different regression terms are on very different scales, specifying the number of significant digits may be preferred. NOBS <- length(fit.ex6.3.adj$residuals)

library(gtsummary)
tbl_regression(intercept = T,
# Use style_number to round to 3 digits (e.g., 1.27)
estimate_fun = function(x) style_number(x, digits = 3),
# Use style_sigfig to keep 3 significant digits (e.g., 1.3)
# estimate_fun = function(x) style_sigfig(x, digits = 3),
label  = list('(Intercept)'  ~ "Intercept",
alc_agefirst   ~ "Age first alc (y)",
demog_age_cat6 ~ "Age (y)",
demog_sex      ~ "Sex",
demog_income   ~ "Income")) %>%
modify_column_hide(p.value) %>%
modify_caption(paste("Logistic regression results for lifetime marijuana use vs.
age at first alcohol use (years) (N = ", NOBS, ")", sep=""))

tbl_regression(intercept = F,     # No intercept in OR table
exponentiate = T,  # OR = exp(B)
# Use style_number to round to 3 digits (e.g., 1.27)
estimate_fun = function(x) style_number(x, digits = 3),
# Use style_sigfig to keep 3 significant digits (e.g., 1.3)
# estimate_fun = function(x) style_sigfig(x, digits = 3),
pvalue_fun   = function(x) style_pvalue(x, digits = 3),
label  = list(alc_agefirst   ~ "Age first alc (y)",
demog_age_cat6 ~ "Age (y)",
demog_sex      ~ "Sex",
demog_income   ~ "Income")) %>%
# Add test.statistic = "Wald" here to get Wald Type III tests
add_global_p(keep = T, test.statistic = "Wald")
tbl_merge(
tbls = list(t1, t2),
)
Table 6.2: Logistic regression results for lifetime marijuana use vs. age at first alcohol use (years) (N = 843)
log(OR)1 95% CI1 OR1 95% CI1 p-value
Intercept 6.254 5.131, 7.451
Age first alc (y) -0.275 -0.331, -0.223 0.759 0.718, 0.800 <0.001
Age (y) <0.001
18-25
26-34 -0.296 -0.949, 0.343 0.744 0.387, 1.409 0.367
35-49 -0.804 -1.400, -0.234 0.447 0.247, 0.791 0.007
50-64 -0.690 -1.289, -0.116 0.502 0.275, 0.891 0.021
65+ -1.275 -1.885, -0.689 0.279 0.152, 0.502 <0.001
Sex 0.707
Female
Male -0.061 -0.379, 0.256 0.941 0.684, 1.291 0.707
Income 0.142
Less than $20,000$20,000 - $49,999 -0.531 -1.059, -0.013 0.588 0.347, 0.987 0.046$50,000 - $74,999 -0.079 -0.679, 0.519 0.924 0.507, 1.680 0.795$75,000 or more -0.361 -0.864, 0.130 0.697 0.421, 1.139 0.154
1 OR = Odds Ratio, CI = Confidence Interval

### 6.17.2 Writing up logistic regression results for a model with an interaction

For the model with an interaction (fit.ex6.3.int), more information must be added to the Methods and Results. Specifically, mention the inclusion of an interaction, the test of interaction, the overall test of the primary predictor of interest that tested the main effect and the interaction at the same time (Section 6.9.1), and the separate AORs for the primary predictor of interest at the levels of the other predictor in the interaction (Section 6.9.2).

The following code displays the raw regression coefficients, 95% CIs, and p-values.

# Regression coefficients, p-values, AORs, 95% CIs for AORs
car::S(fit.ex6.3.int)
# 95% CIs for the regression coefficients
confint(fit.ex6.3.int)
# Pvalues
car::Anova(fit.ex6.3.int, type = 3, test.statistic = "Wald")
# (Results not shown)

The following code displays the overall test of the predictor.

# Overall test of alc_agefirst
fit0 <- glm(mj_lifetime ~ demog_age_cat6 + demog_sex + demog_income,
family = binomial, data = nsduh,
subset = !is.na(alc_agefirst))
anova(fit0, fit.ex6.3.int, test = "Chisq")
# (Results not shown)

The following code displays the AORs at each level of the other term in the interaction. We store these as the objects EST.M and EST.F because we will be re-using them below when we create a table illustrating the interaction effect.

# AOR for Males
gmodels::estimable(fit.ex6.3.int,
c("alc_agefirst"               = 1,
"alc_agefirst:demog_sexMale" = 1),
conf.int = 0.95)
# AOR for Females
gmodels::estimable(fit.ex6.3.int,
c("alc_agefirst"               = 1),
conf.int = 0.95)
# (Results not shown)

Methods: (Same as for the model with no interaction with the following additions.) An interaction between age at first alcohol use and sex was included in the model to assess if the association differed between females and males. The overall significance of age at first alcohol use was assessed by comparing the full model to a reduced model with both its main effect and the interaction removed. Additionally, we estimated the AOR for age at first alcohol use separately for females and for males.

Results: The interaction effect was statistically significant (B = 0.119; 95% CI = 0.011, 0.229; p = .032). Overall, after adjusting for age, sex, and income, age at first alcohol use was significantly associated with lifetime marijuana use (p < .001). The association was significant for both females (OR = 0.713; 95% CI = 0.656, 0.776; p < .001) and males (OR = 0.803; 95% CI = 0.748, 0.863; p < .001). Current age was also significantly associated with the outcome.

See Table 6.3 for full regression results.

Table 6.3: Logistic regression results for lifetime marijuana use vs. age at first alcohol use (years) including an interaction with sex (N = 843)
log(OR)1 95% CI1 OR1 95% CI1 p-value
Intercept 7.374 5.819, 9.073
Age first alc (y) -0.338 -0.425, -0.259 0.713 0.654, 0.772 <0.001
Age (y) <0.001
18-25
26-34 -0.295 -0.954, 0.350 0.744 0.385, 1.419 0.373
35-49 -0.816 -1.417, -0.242 0.442 0.243, 0.785 0.006
50-64 -0.687 -1.291, -0.108 0.503 0.275, 0.897 0.022
65+ -1.260 -1.875, -0.671 0.284 0.153, 0.511 <0.001
Sex 0.030
Female
Male -2.138 -4.093, -0.224 0.118 0.017, 0.800 0.030
Income 0.156
Less than $20,000$20,000 - $49,999 -0.530 -1.062, -0.009 0.588 0.346, 0.991 0.048$50,000 - $74,999 -0.093 -0.697, 0.510 0.911 0.498, 1.665 0.762$75,000 or more -0.363 -0.869, 0.131 0.696 0.419, 1.140 0.154
Age first alc (y) * Sex 0.032
Age first alc (y) * Male 0.119 0.011, 0.229 1.126 1.011, 1.257 0.032
1 OR = Odds Ratio, CI = Confidence Interval

NOTE: Be careful interpreting the “AOR” for the interaction term (1.126). It is not an AOR for a specific predictor. Rather, it is a ratio of odds ratios, and can be interpreted in two ways. First, it is the ratio of the AOR for a 1-unit difference in age of first alcohol use for males divided by that AOR for females (0.803 / 0.713). Second, it is the ratio of the AOR comparing males to females for those of a given age of first alcohol use divided by that AOR for those who started one year earlier. For example, for age of first use 17 years, this ratio can be derived as exp(-2.138 + 17 $$\times$$ 0.119) / exp(-2.138 + 16 $$\times$$ 0.119) = 0.890 / 0.790).

Since we have an interaction, it is helpful to also table the AORs for each term in the interaction at specific levels of the other term. We have already done this for alc_agefirst at each level of demog_sex (0.803 and 0.713 in Results above) but we compute them again here and store them as EST.M and EST.F, respectively. We now also do this for demog_sex at specific values of alc_agefirst. You can choose any values for alc_agefirst within the range of the data, whatever you feel best illustrates how the AOR varies between meaningfully different predictor values; here we use 15, 18, and 21 years.

# alc_agefirst at demog_sex = "Male"
EST.M <- gmodels::estimable(fit.ex6.3.int,
c("alc_agefirst"               = 1,
"alc_agefirst:demog_sexMale" = 1),
conf.int = 0.95)

# alc_agefirst at demog_sex = "Female"
EST.F <- gmodels::estimable(fit.ex6.3.int,
c("alc_agefirst"               = 1),
conf.int = 0.95)

# demog_sex effect at alc_agefirst = 15
EST.15 <- gmodels::estimable(fit.ex6.3.int,
c("demog_sexMale"              = 1,
"alc_agefirst:demog_sexMale" = 15),
conf.int = 0.95)

# demog_sex effect at alc_agefirst = 18
EST.18 <- gmodels::estimable(fit.ex6.3.int,
c("demog_sexMale"              = 1,
"alc_agefirst:demog_sexMale" = 18),
conf.int = 0.95)

# demog_sex effect at alc_agefirst = 21
EST.21 <- gmodels::estimable(fit.ex6.3.int,
c("demog_sexMale"              = 1,
"alc_agefirst:demog_sexMale" = 21),
conf.int = 0.95)

# Exponentiate the AORs and 95% CIs but not the p-values
DF <- cbind(
rbind(exp(EST.F[ c("Estimate", "Lower.CI", "Upper.CI")]),
exp(EST.M[ c("Estimate", "Lower.CI", "Upper.CI")]),
exp(EST.15[c("Estimate", "Lower.CI", "Upper.CI")]),
exp(EST.18[c("Estimate", "Lower.CI", "Upper.CI")]),
exp(EST.21[c("Estimate", "Lower.CI", "Upper.CI")])),
# P-values
rbind(EST.F["Pr(>|X^2|)"],
EST.M["Pr(>|X^2|)"],
EST.15["Pr(>|X^2|)"],
EST.18["Pr(>|X^2|)"],
EST.21["Pr(>|X^2|)"])
)

names(DF)    <- c("AOR", "Lower", "Upper", "p-value")

rownames(DF) <- c("Age of first alcohol use effect @ Sex = Female",
"Age of first alcohol use effect @ Sex = Male",
"Male vs. Female @ Age of first alcohol use = 15y",
"Male vs. Female @ Age of first alcohol use = 18y",
"Male vs. Female @ Age of first alcohol use = 21y")

round(DF, 3)
##                                                    AOR Lower Upper p-value
## Age of first alcohol use effect @ Sex = Female   0.713 0.656 0.776   0.000
## Age of first alcohol use effect @ Sex = Male     0.803 0.748 0.863   0.000
## Male vs. Female @ Age of first alcohol use = 15y 0.702 0.460 1.072   0.097
## Male vs. Female @ Age of first alcohol use = 18y 1.003 0.723 1.390   0.987
## Male vs. Female @ Age of first alcohol use = 21y 1.432 0.866 2.369   0.156

### References

Sjoberg, Daniel D., Joseph Larmarange, Michael Curry, Jessica Lavery, Karissa Whiting, and Emily C. Zabor. 2023. Gtsummary: Presentation-Ready Data Summary and Analytic Result Tables. https://CRAN.R-project.org/package=gtsummary.
Sjoberg, Daniel D., Karissa Whiting, Michael Curry, Jessica A. Lavery, and Joseph Larmarange. 2021. “Reproducible Summary Tables with the Gtsummary Package.” The R Journal 13: 570–80. https://doi.org/10.32614/RJ-2021-053.