## 7.16 Proportional hazards assumption

The reason Cox regression is called Cox “proportional hazards” (PH) regression is that the standard form of the model assumes the hazards for any two individuals have the same proportion at all times. Thus, the PH assumption implies the HR measuring the effect of any predictor is constant over time.

Consider two individuals who differ by one unit in $$X_K$$ and do not differ in any other predictor in the model. Their Cox model hazard functions are the following.

$\begin{array}{lcl} h(t \vert X_1 = x_1, ..., X_K = x_K+1) & = & h_0(t) e^{\beta_1 x_1 + \ldots + \beta_K(x_K+1)} \\ h(t \vert X_1 = x_1, ..., X_K = x_K) & = & h_0(t) e^{\beta_1 x_1 + \ldots + \beta_K x_K} \end{array}$

Taking the ratio of these, everything cancels out except $$e^{\beta_K}$$, the AHR for $$X_K$$, a quantity which does not depend on time. In fact, you could plug in any two sets of predictor values, differing on any or all of the predictors, and the ratio of the hazards would not depend on time. The hazard $$h(t)$$ itself can increase or decrease over time because the baseline hazard $$h_0(t)$$ varies with time. However, the ratio of hazards comparing any two individuals is constant because $$h_0(t)$$ does not vary between individuals and so cancels out in the ratio.

A violation of the PH assumption does not mean we cannot use Cox regression, just that we need to modify it slightly to account for non-proportional hazards, typically by including an interaction between a predictor and time or by stratifying by a predictor.

### 7.16.1 Checking the proportional hazards assumption

Suppose we think the HR for $$X_k$$ varies linearly with time. This means that instead of having $$\beta_k X_k$$ in the model equation, we would have $$\beta_k(t) X_k$$ where $$\beta_k(t) = \beta_{k_1} + \beta_{k_2} t$$. More generally, $$\beta(t)$$ could take on any form. The cox.zph() function checks the PH assumption by comparing the PH model for each predictor to a model that allows that predictor’s regression coefficient to vary smoothly with time, where “smoothly” means the alternative model does not assume a specific form for the time trajectory.

Example 7.6 (continued): Check each predictor for non-proportional hazards in the model we fit for Example 7.6.

To be able to screen each component of a categorical predictor that has $$L > 2$$ levels, we must replace the factor version of the variable with the corresponding $$L-1$$ 0/1 dummy variables. Although not necessary at this step, we will also replace each binary categorical variable with a dummy variable as we may need the dummy variable version when we relax the PH assumption in Section 7.16.3.

When you enter a factor variable into a regression model, R automatically creates dummy variables and includes them in the model, leaving out the one for the reference level. The resulting dummy variables can be extracted from a model using the model.matrix() function. Applying this function to our Cox model fit results in a matrix. The following shows the first few rows.

head(
model.matrix(cox.ex7.6)
)
##   RF_PPTERMYes MAGER MRACEHISPNH Black MRACEHISPNH Other MRACEHISPHispanic DMARUnmarried
## 1            0    35                 0                 0                 1             0
## 2            1    28                 0                 0                 0             1
## 3            0    22                 1                 0                 0             1
## 4            0    19                 0                 0                 1             1
## 5            0    30                 0                 0                 0             0
## 6            0    20                 0                 1                 0             1

Extract the dummy variables and assign them to variables inside the dataset and refit the model.

NOTES:

• Pay careful attention to how the columns of the model matrix are spelled. In this example, some have spaces (e.g., “MRACEHISPNH Black”). When you create variables in the code below, the new variable names on the left cannot have spaces, but the strings inside the [,]s on the right must be spelled exactly as they appear in the model matrix.
• This method of extracting dummy variables and adding them to a dataset only works correctly with a complete case dataset. If the dataset used to fit the model was not complete, create a complete-case dataset and re-fit the model before proceeding.
# Extract dummy variables and add to dataset
natality.complete <- natality.complete %>%
mutate(Previous_Preterm  = model.matrix(cox.ex7.6)[, "RF_PPTERMYes"],
race_eth_NHBlack  = model.matrix(cox.ex7.6)[, "MRACEHISPNH Black"],
race_eth_NHOther  = model.matrix(cox.ex7.6)[, "MRACEHISPNH Other"],
race_eth_Hispanic = model.matrix(cox.ex7.6)[, "MRACEHISPHispanic"],
Unmarried         = model.matrix(cox.ex7.6)[, "DMARUnmarried"])

# Refit the model with dummy variables instead of factors
cox.ex7.6.dummy <- coxph(Surv(gestage37, preterm01) ~
Previous_Preterm + MAGER +
race_eth_NHBlack + race_eth_NHOther + race_eth_Hispanic +
Unmarried,
data = natality.complete)

Next, verify that this new fit is equivalent to the old fit. If it is not, then check the code you used to create the dummy variables and refit the model.

summary(cox.ex7.6)$coef ## coef exp(coef) se(coef) z Pr(>|z|) ## RF_PPTERMYes 1.06193 2.8919 0.23662 4.4879 0.000007191 ## MAGER 0.03132 1.0318 0.01204 2.6002 0.009317466 ## MRACEHISPNH Black 0.55874 1.7485 0.17754 3.1470 0.001649379 ## MRACEHISPNH Other -0.07341 0.9292 0.29555 -0.2484 0.803844986 ## MRACEHISPHispanic 0.26023 1.2972 0.17910 1.4530 0.146213569 ## DMARUnmarried 0.58499 1.7950 0.15332 3.8155 0.000135888 summary(cox.ex7.6.dummy)$coef
##                       coef exp(coef) se(coef)       z    Pr(>|z|)
## Previous_Preterm   1.06193    2.8919  0.23662  4.4879 0.000007191
## MAGER              0.03132    1.0318  0.01204  2.6002 0.009317466
## race_eth_NHBlack   0.55874    1.7485  0.17754  3.1470 0.001649379
## race_eth_NHOther  -0.07341    0.9292  0.29555 -0.2484 0.803844986
## race_eth_Hispanic  0.26023    1.2972  0.17910  1.4530 0.146213569
## Unmarried          0.58499    1.7950  0.15332  3.8155 0.000135888

Finally, use cox.zph() to check for non-proportional hazards using the model fit with dummy variables.

NPH_CHECK <- cox.zph(cox.ex7.6.dummy)
NPH_CHECK
##                    chisq df      p
## Previous_Preterm   1.958  1 0.1618
## MAGER              0.806  1 0.3693
## race_eth_NHBlack   4.067  1 0.0437
## race_eth_NHOther   0.608  1 0.4355
## race_eth_Hispanic  0.186  1 0.6661
## Unmarried          8.769  1 0.0031
## GLOBAL            12.257  6 0.0565

The row for each predictor tests the null hypothesis that the predictor’s coefficient does not vary with time (equivalently, that the predictor’s AHR does not vary with time). Based on statistical significance of individual terms, we would conclude that the coefficients for race_eth_NHBlack and Unmarried have significantly non-proportional hazards. However, we must also take into account the fact that we are carrying out multiple tests. The GLOBAL row tests the null hypothesis that all the predictors meet the PH assumption. Based on the global test, we would conclude that the PH assumption is sufficiently met for all the variables. Yet we must also take into account that, as with all statistical hypothesis tests, the .05 cutoff for significance is arbitrary and conclusions are very dependent on the sample size. Thus, in addition to the statistical test, it is wise to look at a visualization to assess the magnitude of any violation of the PH assumption.

Again, each row in the cox.zph() output is testing the null hypothesis that the $$\beta$$ coefficient for that predictor does not vary with time. A plot() of the output of cox.zph() illustrates how the $$\beta$$s vary with time (Figure 7.16).

par(mfrow=c(2,3))
plot(NPH_CHECK)

If the null hypothesis is correct (proportional hazards) then the $$\beta$$ trajectories will be close to horizontal. This provides a nice visualization for each predictor of both the magnitude of deviation from the PH assumption and the shape of the AHR trajectory. In this example, the AHR trajectories for the first five terms are relatively flat, but the trajectory for Unmarried decreases with time.

The following subsections discuss how to relax the PH assumption for a continuous predictor (using MAGER, mother’s age, for illustration even though it does not significantly violate the PH assumption in this example) and a categorical predictor (Unmarried). Two methods are covered: including an interaction between a predictor and time and stratifying by a categorical predictor.

### 7.16.2 Adding a time interaction for a continuous predictor

“Interacting a predictor with time” means including that predictor’s main effect plus the product of the predictor and some function of time. Which function to use can be based on the shape of the HR trajectory shown when plotting the output of cox.zph(). As mentioned previously, modeling the HR for $$X_k$$ as varying linearly with time is accomplished by replacing $$\beta_k X_k$$ with $$\beta_k(t) X_k$$ where $$\beta_k(t) = \beta_{k_1} + \beta_{k_2} t$$. Equivalently, replace $$\beta_k X_k$$ with $$\beta_{k1}X_k + \beta_{k2} t X_k$$. In other words, to model a linearly varying HR, add to the model an interaction between the predictor and $$t$$.

We could, instead, assume a non-linear function of time. For example, modeling the HR as varying logarithmically with time corresponds to replacing $$\beta_k X_k$$ with $$\beta_{k1}X_k + \beta_{k2} \ln(t) X_k$$, or adding an interaction between the predictor and $$\ln(t)$$. Either of these approaches, linear or non-linear, relaxes the PH assumption by replacing $$X$$’s time-invariant regression coefficient with a function of time.

Example 7.6 (continued): Relax the PH assumption for age by assuming the age effect varies linearly with time.

As mentioned above, accomplishing this requires adding, in addition to the age main effect, an interaction between age and linear time. However, we cannot add t:X to the model since “time” is the outcome – you cannot include the outcome as part of a predictor. Adding a time interaction in coxph() requires adding a special tt() function term, as well as a programming statement defining the function. To add a linear interaction with time, use the function $$tt(x,t) = x*t$$.

# The call for cox.ex7.6
cox.ex7.6$call ## coxph(formula = Surv(gestage37, preterm01) ~ RF_PPTERM + MAGER + ## MRACEHISP + DMAR, data = natality.complete) # Refit the model, including tt(X) to add the interaction with time cox.ex7.6.tt <- coxph(Surv(gestage37, preterm01) ~ RF_PPTERM + MRACEHISP + DMAR + MAGER + tt(MAGER), data = natality.complete, tt = function(x,t,...) x*t) summary(cox.ex7.6.tt)$coef
##                        coef exp(coef) se(coef)       z    Pr(>|z|)
## RF_PPTERMYes       1.063036    2.8951 0.236600  4.4930 0.000007024
## MRACEHISPNH Black  0.559185    1.7492 0.177545  3.1495 0.001635243
## MRACEHISPNH Other -0.073279    0.9293 0.295551 -0.2479 0.804179208
## MRACEHISPHispanic  0.259949    1.2969 0.179096  1.4514 0.146654688
## DMARUnmarried      0.584711    1.7945 0.153347  3.8130 0.000137290
## MAGER             -0.068712    0.9336 0.137613 -0.4993 0.617559391
## tt(MAGER)          0.002948    1.0030 0.004039  0.7299 0.465467436
car::Anova(cox.ex7.6.tt, type = 3, test = "Wald")
## Analysis of Deviance Table (Type III tests)
##
## Response: Surv(gestage37, preterm01)
##           Df Chisq Pr(>Chisq)
## RF_PPTERM  1 20.19   0.000007 ***
## MRACEHISP  3 10.80    0.01283 *
## DMAR       1 14.54    0.00014 ***
## MAGER      1  0.25    0.61756
## tt(MAGER)  1  0.53    0.46547
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As is the case with any variable involved in an interaction, you can no longer treat the main effect for MAGER as the age effect. Instead, it is the age effect at gestational age 0 weeks. The p-value for tt(MAGER) (p = .465) is a test of the interaction and also a test of the PH assumption (specifically against the alternative of a linear time interaction; cox.zph() is more general).

In the original PH model, the AHR for MAGER was $$e^{0.0313} = 1.03$$. How does adding the interaction with time change this? Instead of AHR = $$e^{\beta_\textrm{MAGER}}$$, we now have AHR = $$e^{\beta_\textrm{MAGER} + t*\beta_\textrm{tt(MAGER)}}$$. In our example, then, the AHR for mother’s age is $$e^{-0.0687 + 0.0029*t}$$.

• At $$t = 0$$ (conception), the AHR is $$e^{-0.0687} = 0.93$$.
• At $$t = 30$$ weeks, it is $$e^{-0.0687 + 0.0029*30} = 1.02$$.
• At $$t = 35$$ weeks, it is $$e^{-0.0687 + 0.0029*35} = 1.04$$.

Compute the time at which the AHR is 1 using the formula $$-\beta_X / \beta_{tt(X)}$$ (which is the value of $$t$$ at which $$e^{\beta_X + t*\beta_{tt(X)}} = 1$$).

-1*coef(cox.ex7.6.tt)["MAGER"] /
coef(cox.ex7.6.tt)["tt(MAGER)"]
## MAGER
## 23.31

This implies that the direction of association changes at 23.3 weeks. If this value is not in the range of the observed event times then the AHR is either always < 1 or always > 1. Figure 7.17 illustrates how the AHR changes over time.

# Range of observed event times
SUB    <- natality.complete$preterm01 == 1 TIME <- seq(min(natality.complete$gestage37[SUB]),
max(natality.complete$gestage37[SUB]), 1) BETA <- coef(cox.ex7.6.tt)["MAGER"] BETATT <- coef(cox.ex7.6.tt)["tt(MAGER)"] AHR <- exp(BETA + BETATT*TIME) plot(AHR ~ TIME, type = "l") abline(h = 1, lty = 2, col = "darkgray") abline(v = -1*BETA/BETATT, lty = 2, col = "darkgray") In general, the AHR vs. time plot will be curved. For this continuous predictor, the curve just happens to be approximately linear over this range of weeks. Conclusion: Based on the original PH model, those who are 1 year older have 3.2% greater hazard of preterm birth (AHR = 1.03; 95% CI = 1.01, 1.06; p = .009). Based on the model that relaxes the PH assumption, this difference varies over time, with younger women having greater hazard of preterm birth earlier in pregnancy and older women greater hazard later in pregnancy (because AHR < 1 early, and AHR > 1 late). Again, for this predictor, the PH assumption was not actually violated, but we proceeded with relaxing the assumption for illustration. NOTE: Be careful when interpreting Figure 7.17 – it shows how the hazard ratio comparing those who differ by 1-unit in $$X$$ changes over time, not how the hazard changes over time. ### 7.16.3 Adding a time interaction for a categorical predictor Example 7.6 (continued): Relax the PH assumption for marital status by assuming the effect of Unmarried varies linearly with time. The tt() function only accepts numeric predictors. We used model.matrix() in Section 7.16.1 to create the numeric dummy variable for DMAR = “Unmarried”. This was not actually necessary for checking the PH assumption since you would get the same answer for a factor- or numeric-coded binary variable, but it is necessary here since we need a numeric-coded variable. The code below includes the interaction with time after replacing the factor variable DMAR with its corresponding numeric dummy variable, Unmarried. In general, for a categorical predictor with $$L$$ levels, replace the factor variable with the corresponding $$L - 1$$ dummy variables and add a tt() term for each level that has non-proportional hazards. # Refit the model, including tt(X) to add the interaction with time cox.ex7.6.tt2 <- coxph(Surv(gestage37, preterm01) ~ RF_PPTERM + MAGER + MRACEHISP + Unmarried + tt(Unmarried), data = natality.complete, tt=function(x,t,...) x*t) summary(cox.ex7.6.tt2)$coef
##                       coef exp(coef) se(coef)      z    Pr(>|z|)
## RF_PPTERMYes       1.05525    2.8727  0.23668  4.459 0.000008253
## MAGER              0.03130    1.0318  0.01205  2.598 0.009369082
## MRACEHISPNH Black  0.55553    1.7429  0.17770  3.126 0.001770542
## MRACEHISPNH Other -0.07063    0.9318  0.29555 -0.239 0.811110305
## MRACEHISPHispanic  0.26217    1.2998  0.17914  1.464 0.143328318
## Unmarried          7.19074 1327.0797  2.22153  3.237 0.001208583
## tt(Unmarried)     -0.19377    0.8238  0.06466 -2.996 0.002731143
car::Anova(cox.ex7.6.tt2, type = 3, test = "Wald")
## Analysis of Deviance Table (Type III tests)
##
## Response: Surv(gestage37, preterm01)
##               Df Chisq Pr(>Chisq)
## RF_PPTERM      1 19.88  0.0000083 ***
## MAGER          1  6.75     0.0094 **
## MRACEHISP      3 10.64     0.0138 *
## Unmarried      1 10.48     0.0012 **
## tt(Unmarried)  1  8.98     0.0027 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Just like for the continuous predictor, we can evaluate how the AHR for DMAR changes with time. In the original PH model, the AHR for DMAR (Unmarried vs. Married) was $$e^{0.5850} = 1.79$$. How does adding the interaction with time change this? Instead of AHR = $$e^{\beta_\textrm{DMARUnmarried}}$$, we now have AHR = $$e^{\beta_\textrm{Unmarried} + t*\beta_\textrm{tt(Unmarried)}}$$. In our example, then, the AHR for mother’s age is $$e^{7.1907 - 0.1938*t}$$.

• At $$t = 0$$ (conception), the AHR is $$e^{7.1907} = 1327.08$$.
• At $$t = 30$$ weeks, it is $$e^{7.1907 - 0.1938*30} = 3.97$$.
• At $$t = 35$$ weeks, it is $$e^{7.1907 - 0.1938*35} = 1.51$$.

Compute the time at which the AHR is 1 using the following formula.

-1*coef(cox.ex7.6.tt2)["Unmarried"] /
coef(cox.ex7.6.tt2)["tt(Unmarried)"]
## Unmarried
##     37.11

Since this is beyond the range of our time variable, the AHR will never be exactly 1 and the direction of association does not change over time. Visualize how the AHR changes over time by computing it over a range of times and plotting (Figure 7.18).

SUB    <- natality.complete$preterm01 == 1 TIME <- seq(min(natality.complete$gestage37[SUB]),
max(natality.complete$gestage37[SUB]), 1) BETA <- coef(cox.ex7.6.tt2)["Unmarried"] BETATT <- coef(cox.ex7.6.tt2)["tt(Unmarried)"] AHR <- exp(BETA + BETATT*TIME) plot(AHR ~ TIME, type = "l") abline(h = 1, lty = 2, col = "darkgray") abline(v = -1*BETA/BETATT, lty = 2, col = "darkgray") Here we clearly see the curved shape of the AHR curve. Why is the AHR so large early in pregnancy? It turns out that almost all of the earliest preterm births occurred among unmarried women. Therefore, in this dataset the hazard of preterm birth early in pregnancy is much greater for unmarried than married women. natality.complete %>% filter(preterm01 == 1) %>% arrange(gestage37) %>% select(gestage37, DMAR) %>% slice(1:10) ## # A tibble: 10 × 2 ## gestage37 DMAR ## <dbl> <fct> ## 1 17 Unmarried ## 2 20 Unmarried ## 3 23 Unmarried ## 4 24 Unmarried ## 5 25 Unmarried ## 6 26 Unmarried ## 7 27 Unmarried ## 8 27 Married ## 9 28 Unmarried ## 10 28 Married Conclusion: Based on the original PH model, those who are unmarried have 79.5% greater hazard of preterm birth (AHR = 1.79; 95% CI = 1.33, 2.42; p < .001). Based on the model that relaxes the PH assumption, the AHR decreases with time. It is always > 1, but is greater earlier in pregnancy. NOTE: We used $$tt(x,t) = xt$$ to model how the AHR changes with time. Instead, we could have used a different function of time (e.g., logarithm, polynomial, spline). Had we done so, some of the code above would need to be altered to obtain the correct plot and time at which AHR = 1. ### 7.16.4 Stratifying by a categorical variable For a categorical predictor, another method of relaxing the PH assumption is to use stratification. In a stratified model, each level of the strata variable (e.g., Married, Unmarried) is assumed to have a different baseline hazard function but the same coefficients for the other predictors in the model. Because the baseline hazard functions in different strata can vary over time differently, the ratio of hazards between individuals in different strata can vary over time. While stratification removes the PH assumption for the strata variable, within levels of the strata variable PH is still assumed for the other predictors. Example 7.6 (continued): Use stratification to relax the PH assumption for marital status. Refit the model, including replacing DMAR with strata(DMAR). NOTE: When stratifying by X, do not include X in the model, just strata(X). cox.ex7.6.strata <- coxph(Surv(gestage37, preterm01) ~ RF_PPTERM + MAGER + MRACEHISP + strata(DMAR), data = natality.complete) summary(cox.ex7.6.strata)$coef
##                       coef exp(coef) se(coef)       z   Pr(>|z|)
## RF_PPTERMYes       1.05413    2.8695  0.23670  4.4535 0.00000845
## MAGER              0.03128    1.0318  0.01205  2.5965 0.00941837
## MRACEHISPNH Black  0.55508    1.7421  0.17771  3.1236 0.00178676
## MRACEHISPNH Other -0.07071    0.9317  0.29555 -0.2393 0.81090523
## MRACEHISPHispanic  0.26203    1.2996  0.17914  1.4627 0.14355676
car::Anova(cox.ex7.6.strata, type = 3, test = "Wald")
## Analysis of Deviance Table (Type III tests)
##
## Response: Surv(gestage37, preterm01)
##           Df Chisq Pr(>Chisq)
## RF_PPTERM  1 19.83  0.0000084 ***
## MAGER      1  6.74     0.0094 **
## MRACEHISP  3 10.62     0.0139 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The output looks very similar to an unstratified model, except that now there is no estimate of the effect of the predictor on which you stratified (there is no row for DMAR in the output above). If you want to estimate an AHR for DMAR, do not use stratification, instead include a time interaction as described previously.

Finally, re-check the PH assumption after stratification to verify that there is no remaining violation of the assumption (Figure 7.19).

# Replace factors with dummy variables
cox.ex7.6.strata.dummy <- coxph(Surv(gestage37, preterm01) ~
Previous_Preterm + MAGER +
race_eth_NHBlack + race_eth_NHOther + race_eth_Hispanic +
strata(DMAR),
data = natality.complete)

# Verify the fit is the same
summary(cox.ex7.6.strata)$coef ## coef exp(coef) se(coef) z Pr(>|z|) ## RF_PPTERMYes 1.05413 2.8695 0.23670 4.4535 0.00000845 ## MAGER 0.03128 1.0318 0.01205 2.5965 0.00941837 ## MRACEHISPNH Black 0.55508 1.7421 0.17771 3.1236 0.00178676 ## MRACEHISPNH Other -0.07071 0.9317 0.29555 -0.2393 0.81090523 ## MRACEHISPHispanic 0.26203 1.2996 0.17914 1.4627 0.14355676 summary(cox.ex7.6.strata.dummy)$coef
##                       coef exp(coef) se(coef)       z   Pr(>|z|)
## Previous_Preterm   1.05413    2.8695  0.23670  4.4535 0.00000845
## MAGER              0.03128    1.0318  0.01205  2.5965 0.00941837
## race_eth_NHBlack   0.55508    1.7421  0.17771  3.1236 0.00178676
## race_eth_NHOther  -0.07071    0.9317  0.29555 -0.2393 0.81090523
## race_eth_Hispanic  0.26203    1.2996  0.17914  1.4627 0.14355676
# Recheck for non-proportional hazards
NPH_CHECK <- cox.zph(cox.ex7.6.strata.dummy)
NPH_CHECK
##                     chisq df    p
## Previous_Preterm  1.71483  1 0.19
## MAGER             0.00781  1 0.93
## race_eth_NHBlack  1.48254  1 0.22
## race_eth_NHOther  0.28813  1 0.59
## race_eth_Hispanic 0.66920  1 0.41
## GLOBAL            3.41160  5 0.64
par(mfrow=c(2,3))
plot(NPH_CHECK)

### References

Grambsch, Patricia M., and Terry M. Therneau. 1994. “Proportional Hazards Tests and Diagnostics Based on Weighted Residuals.” Biometrika 81 (3): 515–26. https://doi.org/10.2307/2337123.