Cox regression assumes that continuous predictors have a linear relationship with the outcome (in this case, the log-hazard of the outcome relative to the baseline hazard). To assess this relationship visually, plot what are called “Martingale residuals” vs. each continuous predictor (Figure 7.20).
# Residuals vs. continuous predictor <- natality.complete$MAGER X <- resid(cox.ex7.6, type = "martingale") Y plot(X, Y, pch = 20, col = "darkgray", xlab = "Mother's Age", ylab = "Martingale Residual", main = "Residuals vs. Predictor") abline(h = 0) lines(smooth.spline(X, Y, df = 7), lty = 2, lwd = 2)
The residuals vs. mother’s age curve appears somewhat non-linear. As with other regression models, to relax the linearity assumption transform the predictor using a polynomial, logarithm, or other function. The code below adds a quadratic term for maternal age and rechecks the linearity assumption (Figure 7.21).
As shown in Figure 7.21, adding a quadratic term helps with the non-linearity (the uptick in the curves at older maternal age is due to a single observation with an extreme age value). Inside of
df to a smaller (larger) value to get more (less) smoothing. In Figure 7.22, the plots with
df = 3 appear linear at all ages.
In general, be careful when using a small
df value. Make sure to also look at the plot with a larger
df value, as we did here, to make sure you are not over-smoothing.