## 6.20 Conditional logistic regression for matched case-control data

Some case-control studies employ matching in an attempt to ensure the controls are comparable to the cases on confounding variables. Matching, however, must be taken into account in the analysis method. Matched case-control data can be validly analyzed using conditional logistic regression which stratifies the analysis by groups defined by the unique combinations of the matching variables. To carry out a conditional logistic regression in R, use the clogit() function in the survival library with the matching variables listed as strata in the model. clogit() expects the outcome to be numeric with possible values 0 and 1, and the output does not contain an intercept. The conditional logit model does not estimate associations between the strata variables and the outcome. However, since their purpose was to control for confounding, this is not typically an issue.

Example 6.6: A matched case-control dataset of births was created from a subset of the 2018 U.S. Natality teaching dataset, containing 195 births that were followed by admission to the newborn intensive care unit (AB_NICU) and 1375 births that were not, matched on maternal education (MEDUC) and age (MAGER). Assess the association between admission to the newborn intensive care unit and previous preterm birth (RF_PPTERM), accounting for the matching in the analysis.

After loading the data, check the formatting of the outcome and convert to 0/1, if needed.

load("Data/natality_CC_rmph.Rdata")

table(natality_CC$AB_NICU) ## ## No Yes ## 1375 195 # clogit() expects a 0/1 outcome # Convert from No/Yes to 0/1 natality_CC <- natality_CC %>% mutate(NICU = as.numeric(AB_NICU == "Yes")) # Check derivation table(natality_CC$AB_NICU, natality_CC$NICU, exclude = NULL) ## ## 0 1 ## No 1375 0 ## Yes 0 195 Next, verify that the distributions of the matching variables are the same for cases and controls. # MEDUC is categorical addmargins( prop.table( table(natality_CC$MEDUC, natality_CC$NICU), 2), 1) ## ## 0 1 ## <HS 0.10036 0.12308 ## HS 0.28945 0.33333 ## Some college 0.27709 0.26154 ## Bachelor 0.23636 0.17436 ## Adv Degree 0.09673 0.10769 ## Sum 1.00000 1.00000 # MAGER is continuous boxplot(MAGER ~ NICU, data = natality_CC) Matching is not always perfect. The distributions are not identical between cases and controls but are pretty close. Finally, fit the conditional logit model and examine the output. library(survival) fit.clr <- clogit(NICU ~ RF_PPTERM + strata(MEDUC, MAGER), data = natality_CC) # Regression coefficient round(summary(fit.clr)$coef, 4)
##               coef exp(coef) se(coef)     z Pr(>|z|)
## RF_PPTERMYes 1.123     3.074    0.316 3.554   0.0004
# OR and 95% CI
OR.CI <- cbind("OR" = exp(coef(fit.clr)),
exp(confint(fit.clr)))
round(OR.CI, 3)
##                 OR 2.5 % 97.5 %
## RF_PPTERMYes 3.074 1.655   5.71
# Type III test
car::Anova(fit.clr, type = 3, test.statistic = "Wald")
## Analysis of Deviance Table (Type III tests)
##
## Response: Surv(rep(1, 1570L), NICU)
##           Df Chisq Pr(>Chisq)
## RF_PPTERM  1  12.6    0.00038 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: Previous preterm birth is significantly associated with admission to the NICU (p < .001). Newborns to mothers with a previous preterm birth have 3.1 times the odds of admission to the NICU (OR = 3.07; 95% CI = 1.65, 5.71; p < .001).

### References

Gail, Mitchell H., Jay H. Lubin, and Lawrence V. Rubinstein. 1981. “Likelihood Calculations for Matched Case-Control Studies and Survival Studies with Tied Death Times.” Biometrika 68 (3): 703–7. https://doi.org/10.2307/2335457.
Logan, John A. 1983. “A Multivariate Model for Mobility Tables.” American Journal of Sociology 89 (2): 324–49. https://www.jstor.org/stable/2779144.
Therneau, Terry M. 2023. Survival: Survival Analysis. https://github.com/therneau/survival.