By computing predicted survival probabilities over a range of times, we can plot the survival function estimated by Cox regression for any set of predictor values. This is especially useful if we want to visually compare the estimated survival functions between groups.
NOTE: The estimated curves plotted here serve the specific purpose of illustrating relative differences between groups under the assumptions of the model and the form of the predictors included. The estimated absolute survival probabilities depend on the specific predictor values chosen and only represent individuals with those values rather than the average survival of a cohort of individuals with different predictor values. See Therneau T., Crowson C.S., and Atkinson E.J. (2015) for a discussion of options for estimating the adjusted survival function for a reference cohort, rather than for one specific set of predictor values as is done here, and, optionally, with less restrictive model assumptions. Some of these options are implemented in the
ggadjustedcurves() function in the
survminer package (Kassambara, Kosinski, and Biecek 2021).
Example 7.6 (continued): Plot the estimated survival functions for those with and without a previous preterm birth. Set the other predictors at their mean or reference level.
First, create a
data.frame for each level of the variable we are interested in (
RF_PRETERM) containing specific values for the other predictors and a range of times (so we can estimate a curve rather than survival at a single time).
# DF for RF_PPTERM = "Yes" <- data.frame(preterm01 = 1, PPTERM_Yes gestage37 = 17:36, DMAR = "Married", MRACEHISP = "NH White", MAGER = mean(natality.complete$MAGER), RF_PPTERM = "Yes") # For the other level(s) of the variable of interest, copy and change the value <- PPTERM_Yes PPTERM_No $RF_PPTERM <- "No"PPTERM_No
Next, compute the predictions at these values.
<- predict(cox.ex7.6, PPTERM_Yes, type = "survival") PRED_PPTERM_Yes <- predict(cox.ex7.6, PPTERM_No, type = "survival")PRED_PPTERM_No
Finally, plot the predictions vs. time (Figure 7.13).
# Create a blank plot with the correct axis labels and limits plot(0, 0, col = "white", xlab = "Gestational Age (weeks)", ylab = "P(Not yet preterm)", xlim = c(17, 36), ylim = c(0.70, 1), # ylim = c(0, 1) if you need a larger range font.axis = 2, font.lab = 2, cex.axis = 1.25, cex.lab = 1.25) # Add a line for each group lines(PPTERM_No$gestage37, PRED_PPTERM_No, lwd = 2, lty = 1) lines(PPTERM_Yes$gestage37, PRED_PPTERM_Yes, lwd = 2, lty = 2) legend(17, 0.85, c("No", "Yes"), title = "Previous preterm birth", cex=1.2, lty=c(1,2), lwd=c(2,2))
If you plot \(1 - S(t)\) instead, the vertical axis will display the probability of preterm birth (Figure 7.14).
plot(0, 0, col = "white", xlab = "Gestational Age (weeks)", ylab = "P(Preterm by this time)", xlim = c(17, 36), ylim = c(0, 0.3), # ylim = c(0, 1) if you need a larger range font.axis = 2, font.lab = 2, cex.axis = 1.25, cex.lab = 1.25) # Since this is 1 - S(t), reverse the order of the lines lines(PPTERM_Yes$gestage37, 1 - PRED_PPTERM_Yes, lwd = 2, lty = 2) lines(PPTERM_No$gestage37, 1 - PRED_PPTERM_No, lwd = 2, lty = 1) legend(17, 0.2, c("Yes", "No"), title = "Previous preterm birth", cex=1.2, lty=c(2,1), lwd=c(2,2))