8.7 Weighted survival analysis
To incorporate a complex survey design into a survival analysis, use
svykm()
to compute a weighted Kaplan-Meier estimate of the survival function,svylogrank()
to carry out a weighted log-rank test to compare survival curves between groups, andsvycoxph()
to carry out weighted Cox regression.
8.7.1 Weighted Kaplan-Meier estimate of the survival function
Example 8.4: Using the NHANES 2017-2018 data (nhanes1718_rmph.Rdata
), estimate the weighted Kaplan-Meier estimate of the survival function for time to first diagnosis of asthma. The time variable is age when first had asthma (MCQ025
) and the event indicator is MCQ010
– “Ever been told you have asthma”).
We need an event indicator that is 1 for those who have been told they have asthma and 0 otherwise, and a time variable that is the age when told one first had asthma for those who have been told and current age otherwise (censored asthma age). Format the event indicator variable and then fill in current age for those without asthma.
NOTE: This step is needed for the Kaplan-Meier estimator, log-rank test, and Cox regression.
# Format the event indicator
levels(nhanes.mod$MCQ010)
## [1] "Yes" "No"
$asthma <- as.numeric(nhanes.mod$MCQ010 == "Yes")
nhanes.modtable(nhanes.mod$MCQ010, nhanes.mod$asthma, exclude = NULL)
##
## 0 1 <NA>
## Yes 0 1325 0
## No 7563 0 0
## <NA> 0 0 366
# Create a copy of the age variable
$asthma_age <- nhanes.mod$MCQ025
nhanes.mod
# For those with asthma = 0, set asthma age to current age
<- !is.na(nhanes.mod$asthma) & nhanes.mod$asthma == 0
SUB $asthma_age[SUB] <- nhanes.mod$RIDAGEYR[SUB]
nhanes.mod
# Check
# For !SUB, the difference between these should be 0 or NA
# (asthma_age is MCQ025)
summary(nhanes.mod$MCQ025[ !SUB] -
$asthma_age[!SUB]) nhanes.mod
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 0 0 0 0 0 387
# For SUB, this should be all missing
summary(nhanes.mod$MCQ025[SUB])
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## NA NA NA NaN NA NA 7563
# For !SUB, the difference between these should be 0
# (asthma_age is RIDAGEYR)
summary(nhanes.mod$RIDAGEYR[ SUB] -
$asthma_age[SUB]) nhanes.mod
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 0 0 0
Next, re-create the design object so these new variables are available to the design. The asthma variables were part of the NHANES interview, and no other variables are included in this analysis, so use the interview design weights.
library(survey)
options(survey.lonely.psu = "adjust")
options(survey.adjust.domain.lonely = TRUE)
<- svydesign(strata=~SDMVSTRA, id=~SDMVPSU, weights=~WTINT2YR,
design.INT nest=TRUE, data=nhanes.mod)
Use svykm()
and plot()
to estimate and plot the survival function (Figure 8.6).
.4 <- svykm(Surv(asthma_age, asthma) ~ 1,
km.ex8design = design.INT)
.4 km.ex8
## Weighted survival curve: svykm(formula = Surv(asthma_age, asthma) ~ 1, design = design.INT)
## Q1 = Inf median = Inf Q3 = Inf
plot(km.ex8.4, ylim = c(0.65, 1),
xlab = "Age (years)",
ylab = "Proportion without asthma")
8.7.2 Weighted log-rank test for comparing groups
To compare groups, use svykm
with a factor variable on the right-hand side of the formula, and then svylogrank
to carry out the log-rank test.
NOTE: If you have not already done so, format the event indicator and time variable (see Section 8.7.1).
Example 8.4 (continued): Using the NHANES 2017-2018 data (nhanes1718_rmph.Rdata
), use a weighted log-rank test to compare time to first diagnosis of asthma between race/ethnicity groups. The race/ethnicity variable was also in the interview, so we can continue using the design object that includes the interview weights.
.4b <- svykm(Surv(asthma_age, asthma) ~ race_eth,
km.ex8design = design.INT)
.4b km.ex8
## Weighted survival curves:
## svykm(formula = Surv(asthma_age, asthma) ~ race_eth, design = design.INT)
## Hispanic : Q1 = 80 median = Inf Q3 = Inf
## Non-Hispanic White : Q1 = Inf median = Inf Q3 = Inf
## Non-Hispanic Black : Q1 = 62 median = Inf Q3 = Inf
## Non-Hispanic Other : Q1 = 80 median = Inf Q3 = Inf
Figure 8.7 illustrates the comparison of weighted survival functions between race/ethnicity groups.
# When plotting svykm objects, the lty
# option must be in a list. When including
# lty, ylim must also be in a list (without
# lty, ylim works without being in a list)
plot(km.ex8.4b, pars=list(lty=1:4),
xlab = "Age (years)",
ylab = "Proportion without asthma",
ylim = list(c(0.65, 1)))
legend(45, 1, levels(nhanes.mod$race_eth)[c(2,4,1,3)],
lty = (1:4)[c(2,4,1,3)])
# WARNING: svylogrank() will fail if there are missing values.
# Create a domain with only complete cases
<- nhanes.mod %>%
nhanes.mod mutate(domain_asthma = !is.na(asthma_age) & !is.na(asthma) & !is.na(race_eth))
# Re-create the design to include the new domain variable
<- svydesign(strata=~SDMVSTRA, id=~SDMVPSU, weights=~WTINT2YR,
design.INT nest=TRUE, data=nhanes.mod)
# Subset the design to do a the domain analysis
<- subset(design.INT, domain_asthma)
design_asthma
# Log-rank test
svylogrank(Surv(asthma_age, asthma) ~ race_eth, design = design_asthma)[[2]]
## Chisq p
## 29.330856009 0.000001908
8.7.3 Weighted Cox regression
8.7.3.1 Fitting the model
Use svycoxph()
to fit a weighted Cox proportional hazards regression. The current functions do not have an option to change the DF from the default. An exception is the function svycontrast_design_df()
(in Functions_rmph.R
) which uses the design DF by default, with an option to use the residual DF.
NOTE: If you have not already done so, format the event indicator and time variable (see Section 8.7.1).
Example 8.5: Using the NHANES 2017-2018 data (nhanes1718_rmph.Rdata
), compare time to first diagnosis of asthma between race/ethnicity groups, adjusted for gender (RIAGENDR
), using a Cox proportional hazards regression.
The results are shown in Table 8.7.
.5 <- svycoxph(Surv(asthma_age, asthma) ~ race_eth + RIAGENDR,
cox.ex8design = design.INT)
::Anova(cox.ex8.5, type = 3)
carsummary(cox.ex8.5)
cbind("AHR" = exp(summary(cox.ex8.5)$coef[, "coef"]),
exp(confint(cox.ex8.5)),
"p" = summary(cox.ex8.5)$coef[, "Pr(>|z|)"])
# (results not shown)
library(gtsummary)
# Table of regression coefficients
<- cox.ex8.5 %>%
t1 tbl_regression(estimate_fun = function(x) style_number(x, digits = 3),
label = list(race_eth ~ "Race/Ethnicity",
~ "Gender")) %>%
RIAGENDR modify_column_hide(p.value) %>%
modify_caption("Weighted Cox regression results for time to asthma
vs. race/ethnicity")
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (30) clusters.
## svydesign(strata = ~SDMVSTRA, id = ~SDMVPSU, weights = ~WTINT2YR,
## nest = TRUE, data = nhanes.mod)
# Table of hazard ratios
<- cox.ex8.5 %>%
t2 tbl_regression(exponentiate = T, # HR = exp(B)
estimate_fun = function(x) style_number(x, digits = 3),
pvalue_fun = function(x) style_pvalue(x, digits = 3),
label = list(race_eth ~ "Race/Ethnicity",
~ "Gender")) %>%
RIAGENDR add_global_p(keep = T)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (30) clusters.
## svydesign(strata = ~SDMVSTRA, id = ~SDMVPSU, weights = ~WTINT2YR,
## nest = TRUE, data = nhanes.mod)
tbl_merge(
tbls = list(t1, t2),
tab_spanner = c("**Regression Coefficients**", "**Adjusted Hazard Ratio**")
)
Characteristic | Regression Coefficients | Adjusted Hazard Ratio | |||
---|---|---|---|---|---|
log(HR)^{1} | 95% CI^{1} | HR^{1} | 95% CI^{1} | p-value | |
Race/Ethnicity | <0.001 | ||||
Hispanic | — | — | — | — | |
Non-Hispanic White | -0.083 | -0.316, 0.151 | 0.921 | 0.729, 1.162 | 0.487 |
Non-Hispanic Black | 0.314 | 0.071, 0.557 | 1.369 | 1.074, 1.745 | 0.011 |
Non-Hispanic Other | 0.028 | -0.391, 0.446 | 1.028 | 0.677, 1.562 | 0.897 |
Gender | 0.117 | ||||
Male | — | — | — | — | |
Female | 0.138 | -0.035, 0.310 | 1.148 | 0.966, 1.364 | 0.117 |
^{1} HR = Hazard Ratio, CI = Confidence Interval |
8.7.3.2 Exclude cases with zero weights
Unlike the other functions we have used in this chapter, svycoxph()
returns an error if there are any cases with 0 weight. To avoid this, use a domain analysis to exclude these cases. If you were already using a domain analysis, modify that domain to also exclude these cases. For example, in the NHANES dataset, there are some cases with exam weights of 0 (those not in the examination subsample).
<- svydesign(strata=~SDMVSTRA, id=~SDMVPSU, weights=~WTMEC2YR,
design.MEC nest=TRUE, data=nhanes.mod)
<- subset(design.MEC, WTMEC2YR > 0) design.MEC.pos.weights
8.7.3.3 Interaction
Including an interaction in the model and estimating effects for one variable at levels of another proceeds as described for weighted binary logistic regression above.
Example 8.5 (continued): Does the effect of body mass index (BMXBMI
) on time to asthma differ between race/ethnicity groups, adjusted for gender? Estimate the BMI effect at each level of race/ethnicity.
Since BMI was only collected in the examination subsample, we must use the examination weights when specifying the sample design (and use a domain analysis that excludes those with zero weights - see Section 8.7.3.2).
5.int <- svycoxph(Surv(asthma_age, asthma) ~ race_eth + RIAGENDR + BMXBMI +
cox.ex8.:BMXBMI,
race_ethdesign = design.MEC.pos.weights)
::Anova(cox.ex8.5.int, type = 3) car
## Analysis of Deviance Table (Type III tests)
##
## Response: Surv(asthma_age, asthma)
## Df Chisq Pr(>Chisq)
## race_eth 3 6.73 0.081 .
## RIAGENDR 1 3.55 0.060 .
## BMXBMI 1 1.53 0.216
## race_eth:BMXBMI 3 5.09 0.165
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(cox.ex8.5.int)
# (results not shown)
rbind(
"BMI @ Hispanic" = svycontrast_design_df(cox.ex8.5.int,
c("BMXBMI" = 1),
EXP=T),
"BMI @ NH White" = svycontrast_design_df(cox.ex8.5.int,
c("BMXBMI" = 1,
"race_ethNon-Hispanic White:BMXBMI" = 1),
EXP=T),
"BMI @ NH Black" = svycontrast_design_df(cox.ex8.5.int,
c("BMXBMI" = 1,
"race_ethNon-Hispanic Black:BMXBMI" = 1),
EXP=T),
"BMI @ NH Other" = svycontrast_design_df(cox.ex8.5.int,
c("BMXBMI" = 1,
"race_ethNon-Hispanic Other:BMXBMI" = 1),
EXP=T)
)
## est lower upper
## BMI @ Hispanic 0.9823 0.9525 1.013
## BMI @ NH White 1.0047 0.9808 1.029
## BMI @ NH Black 0.9945 0.9785 1.011
## BMI @ NH Other 1.0211 0.9962 1.047