8.7 Weighted survival analysis

To incorporate a complex survey design into a survival analysis, use

  • svykm() to compute a weighted Kaplan-Meier estimate of the survival function,
  • svylogrank() to carry out a weighted log-rank test to compare survival curves between groups, and
  • svycoxph() to carry out weighted Cox regression.

8.7.1 Weighted Kaplan-Meier estimate of the survival function

Example 8.4: Using the NHANES 2017-2018 data (nhanes1718_rmph.Rdata), estimate the weighted Kaplan-Meier estimate of the survival function for time to first diagnosis of asthma. The time variable is age when first had asthma (MCQ025) and the event indicator is MCQ010 – “Ever been told you have asthma”).

We need an event indicator that is 1 for those who have been told they have asthma and 0 otherwise, and a time variable that is the age when told one first had asthma for those who have been told and current age otherwise (censored asthma age). Format the event indicator variable and then fill in current age for those without asthma.

NOTE: This step is needed for the Kaplan-Meier estimator, log-rank test, and Cox regression.

# Format the event indicator
levels(nhanes.mod$MCQ010)
## [1] "Yes" "No"
nhanes.mod$asthma <- as.numeric(nhanes.mod$MCQ010 == "Yes")
table(nhanes.mod$MCQ010, nhanes.mod$asthma, exclude = NULL)
##       
##           0    1 <NA>
##   Yes     0 1325    0
##   No   7563    0    0
##   <NA>    0    0  366
# Create a copy of the age variable
nhanes.mod$asthma_age <- nhanes.mod$MCQ025

# For those with asthma = 0, set asthma age to current age
SUB <- !is.na(nhanes.mod$asthma) & nhanes.mod$asthma == 0
nhanes.mod$asthma_age[SUB] <- nhanes.mod$RIDAGEYR[SUB]

# Check
# For !SUB, the difference between these should be 0 or NA
# (asthma_age is MCQ025)
summary(nhanes.mod$MCQ025[    !SUB] -
        nhanes.mod$asthma_age[!SUB])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       0       0       0       0       0       0     387
# For SUB, this should be all missing
summary(nhanes.mod$MCQ025[SUB])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##      NA      NA      NA     NaN      NA      NA    7563
# For !SUB, the difference between these should be 0
# (asthma_age is RIDAGEYR)
summary(nhanes.mod$RIDAGEYR[  SUB] -
        nhanes.mod$asthma_age[SUB])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0       0       0       0

Next, re-create the design object so these new variables are available to the design. The asthma variables were part of the NHANES interview, and no other variables are included in this analysis, so use the interview design weights.

library(survey)
options(survey.lonely.psu = "adjust")
options(survey.adjust.domain.lonely = TRUE)

design.INT <- svydesign(strata=~SDMVSTRA, id=~SDMVPSU, weights=~WTINT2YR,
                        nest=TRUE, data=nhanes.mod)

Use svykm() and plot() to estimate and plot the survival function (Figure 8.6).

km.ex8.4 <- svykm(Surv(asthma_age, asthma) ~ 1,
                  design = design.INT)
km.ex8.4
## Weighted survival curve: svykm(formula = Surv(asthma_age, asthma) ~ 1, design = design.INT)
## Q1 = Inf  median = Inf  Q3 = Inf
plot(km.ex8.4, ylim = c(0.65, 1),
     xlab = "Age (years)",
     ylab = "Proportion without asthma")
Weighted Kaplan-Meier survival function

Figure 8.6: Weighted Kaplan-Meier survival function

Conclusion: We see that the proportion without asthma drops rapidly before age 20 years, indicating the greatest risk of an asthma diagnosis is in childhood. However, as the survival curve continues to decrease over the lifespan, adults who have not yet been diagnosed with asthma are still at risk.

8.7.2 Weighted log-rank test for comparing groups

To compare groups, use svykm with a factor variable on the right-hand side of the formula, and then svylogrank to carry out the log-rank test.

NOTE: If you have not already done so, format the event indicator and time variable (see Section 8.7.1).

Example 8.4 (continued): Using the NHANES 2017-2018 data (nhanes1718_rmph.Rdata), use a weighted log-rank test to compare time to first diagnosis of asthma between race/ethnicity groups. The race/ethnicity variable was also in the interview, so we can continue using the design object that includes the interview weights.

km.ex8.4b <- svykm(Surv(asthma_age, asthma) ~ race_eth,
                  design = design.INT)
km.ex8.4b
## Weighted survival curves:
## svykm(formula = Surv(asthma_age, asthma) ~ race_eth, design = design.INT)
## Hispanic : Q1 = 80  median = Inf  Q3 = Inf 
## Non-Hispanic White : Q1 = Inf  median = Inf  Q3 = Inf 
## Non-Hispanic Black : Q1 = 62  median = Inf  Q3 = Inf 
## Non-Hispanic Other : Q1 = 80  median = Inf  Q3 = Inf

Figure 8.7 illustrates the comparison of weighted survival functions between race/ethnicity groups.

# When plotting svykm objects, the lty
# option must be in a list. When including
# lty, ylim must also be in a list (without
# lty, ylim works without being in a list)
plot(km.ex8.4b, pars=list(lty=1:4),
     xlab = "Age (years)",
     ylab = "Proportion without asthma",
     ylim = list(c(0.65, 1)))
legend(45, 1, levels(nhanes.mod$race_eth)[c(2,4,1,3)],
       lty = (1:4)[c(2,4,1,3)])
Weighted Kaplan-Meier survival function by race/ethnicity

Figure 8.7: Weighted Kaplan-Meier survival function by race/ethnicity

# WARNING: svylogrank() will fail if there are missing values.
# Create a domain with only complete cases
nhanes.mod <- nhanes.mod %>% 
  mutate(domain_asthma = !is.na(asthma_age) & !is.na(asthma) & !is.na(race_eth))

# Re-create the design to include the new domain variable
design.INT <- svydesign(strata=~SDMVSTRA, id=~SDMVPSU, weights=~WTINT2YR,
                        nest=TRUE, data=nhanes.mod)

# Subset the design to do a  the domain analysis
design_asthma <- subset(design.INT, domain_asthma)

# Log-rank test
svylogrank(Surv(asthma_age, asthma) ~ race_eth, design = design_asthma)[[2]]
##        Chisq            p 
## 29.330856009  0.000001908

Conclusion: Time to first diagnosis of asthma differs significantly between race/ethnicity groups (p < .001). As shown in Figure 8.7, non-Hispanic Black individuals have the greatest proportion of individuals with asthma at any given age, starting from childhood and persisting across the lifespan.

8.7.3 Weighted Cox regression

8.7.3.1 Fitting the model

Use svycoxph() to fit a weighted Cox proportional hazards regression. The current functions do not have an option to change the DF from the default. An exception is the function svycontrast_design_df() (in Functions_rmph.R) which uses the design DF by default, with an option to use the residual DF.

NOTE: If you have not already done so, format the event indicator and time variable (see Section 8.7.1).

Example 8.5: Using the NHANES 2017-2018 data (nhanes1718_rmph.Rdata), compare time to first diagnosis of asthma between race/ethnicity groups, adjusted for gender (RIAGENDR), using a Cox proportional hazards regression.

The results are shown in Table 8.7.

cox.ex8.5 <- svycoxph(Surv(asthma_age, asthma) ~ race_eth + RIAGENDR, 
                  design = design.INT)
car::Anova(cox.ex8.5, type = 3)
summary(cox.ex8.5)
cbind("AHR" = exp(summary(cox.ex8.5)$coef[, "coef"]),
              exp(confint(cox.ex8.5)),
      "p"   = summary(cox.ex8.5)$coef[, "Pr(>|z|)"])
# (results not shown)
library(gtsummary)
# Table of regression coefficients
t1 <- cox.ex8.5 %>%
  tbl_regression(estimate_fun = function(x) style_number(x, digits = 3),
                 label  = list(race_eth ~ "Race/Ethnicity",
                               RIAGENDR ~ "Gender")) %>%
  modify_column_hide(p.value) %>%
  modify_caption("Weighted Cox regression results for time to asthma 
                 vs. race/ethnicity, adjusted for gender")
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (30) clusters.
## svydesign(strata = ~SDMVSTRA, id = ~SDMVPSU, weights = ~WTINT2YR, 
##     nest = TRUE, data = nhanes.mod)
# Table of hazard ratios
t2 <- cox.ex8.5 %>%
  tbl_regression(exponentiate = T,  # HR = exp(B)
                 estimate_fun = function(x) style_number(x, digits = 3),
                 pvalue_fun   = function(x) style_pvalue(x, digits = 3),
                 label  = list(race_eth ~ "Race/Ethnicity",
                               RIAGENDR ~ "Gender")) %>%
  add_global_p(keep = T)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (30) clusters.
## svydesign(strata = ~SDMVSTRA, id = ~SDMVPSU, weights = ~WTINT2YR, 
##     nest = TRUE, data = nhanes.mod)
tbl_merge(
  tbls = list(t1, t2),
  tab_spanner = c("**Regression Coefficients**", "**Adjusted Hazard Ratio**")
)
Table 8.7: Weighted Cox regression results for time to asthma vs. race/ethnicity, adjusted for gender
Characteristic Regression Coefficients Adjusted Hazard Ratio
log(HR)1 95% CI1 HR1 95% CI1 p-value
Race/Ethnicity



<0.001
    Hispanic
    Non-Hispanic White -0.083 -0.316, 0.151 0.921 0.729, 1.162 0.487
    Non-Hispanic Black 0.314 0.071, 0.557 1.369 1.074, 1.745 0.011
    Non-Hispanic Other 0.028 -0.391, 0.446 1.028 0.677, 1.562 0.897
Gender



0.117
    Male
    Female 0.138 -0.035, 0.310 1.148 0.966, 1.364 0.117
1 HR = Hazard Ratio, CI = Confidence Interval

Conclusion: After adjusting for gender, time to first diagnosis of asthma is significantly different between race/ethnicity groups (p < .001). Non-Hispanic Black participants have 36.9% greater hazard of asthma diagnosis than Hispanic participants. Note that the AOR for non-Hispanic White participants is < 1, so the disparity between non-Hispanic Black and non-Hispanic White is even greater than 36.9%. To estimate it, divide the two AORs, 1.369/0.921 = 1.4864 to see that non-Hispanic Black participants have 48.6% greater hazard of an asthma diagnosis than non-Hispanic White participants. If you also want a 95% confidence interval and a p-value to go along with this estimate, then re-level the race/ethnicity variable to make non-Hispanic White the reference level, refit the model, and then examine the effect for non-Hispanic Black, which would then be a comparison with non-Hispanic White.

8.7.3.2 Exclude cases with zero weights

Unlike the other functions we have used in this chapter, svycoxph() returns an error if there are any cases with 0 weight. To avoid this, use a domain analysis to exclude these cases. If you were already using a domain analysis, modify that domain to also exclude these cases. For example, in the NHANES dataset, there are some cases with exam weights of 0 (those not in the examination subsample).

design.MEC <- svydesign(strata=~SDMVSTRA, id=~SDMVPSU, weights=~WTMEC2YR,
                        nest=TRUE, data=nhanes.mod)

design.MEC.pos.weights <- subset(design.MEC, WTMEC2YR > 0)

8.7.3.3 Interaction

Including an interaction in the model and estimating effects for one variable at levels of another proceeds as described for weighted binary logistic regression above.

Example 8.5 (continued): Does the effect of body mass index (BMXBMI) on time to asthma differ between race/ethnicity groups, adjusted for gender? Estimate the BMI effect at each level of race/ethnicity.

Since BMI was only collected in the examination subsample, we must use the examination weights when specifying the sample design (and use a domain analysis that excludes those with zero weights - see Section 8.7.3.2).

cox.ex8.5.int <- svycoxph(Surv(asthma_age, asthma) ~ race_eth + RIAGENDR + BMXBMI +
                         race_eth:BMXBMI,
                  design = design.MEC.pos.weights)
car::Anova(cox.ex8.5.int, type = 3)
## Analysis of Deviance Table (Type III tests)
## 
## Response: Surv(asthma_age, asthma)
##                 Df Chisq Pr(>Chisq)  
## race_eth         3  6.73      0.081 .
## RIAGENDR         1  3.55      0.060 .
## BMXBMI           1  1.53      0.216  
## race_eth:BMXBMI  3  5.09      0.165  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(cox.ex8.5.int)
# (results not shown)
rbind(
  "BMI @ Hispanic" = svycontrast_design_df(cox.ex8.5.int,
    c("BMXBMI" = 1),
    EXP=T),
  "BMI @ NH White" = svycontrast_design_df(cox.ex8.5.int, 
    c("BMXBMI"                            = 1,
      "race_ethNon-Hispanic White:BMXBMI" = 1),
    EXP=T),
  "BMI @ NH Black" = svycontrast_design_df(cox.ex8.5.int, 
    c("BMXBMI"                            = 1,
      "race_ethNon-Hispanic Black:BMXBMI" = 1),
    EXP=T),
  "BMI @ NH Other" = svycontrast_design_df(cox.ex8.5.int, 
    c("BMXBMI"                            = 1,
      "race_ethNon-Hispanic Other:BMXBMI" = 1),
    EXP=T)
)
##                   est  lower upper
## BMI @ Hispanic 0.9823 0.9525 1.013
## BMI @ NH White 1.0047 0.9808 1.029
## BMI @ NH Black 0.9945 0.9785 1.011
## BMI @ NH Other 1.0211 0.9962 1.047

Conclusion: After adjusting for gender, the effect of body mass index on time to asthma does not significantly differ between race/ethnicity groups (p = .165).