7.7 Cox regression
The terms Cox regression, Cox model, and Cox proportional hazards regression all refer to a semi-parametric method introduced by D. R. Cox in 1972 (Cox 1972). The method is “semi-parametric” because it makes no assumption about the distribution of the event times (similar to the non-parametric KM method) but it does assume the hazard function depends on a set of parameters (regression coefficients) that define the association between the hazard and a set of predictors.
The Cox model is written as
\[\begin{equation} h(t) = h_0(t) e^{\beta_1 X_1 + \ldots + \beta_K X_K} \tag{7.3} \end{equation}\]
where \(h_0(t)\) is referred to as the baseline hazard function and, as in other forms of regression, there are \(\beta\) terms each multiplied by a predictor. The baseline hazard is similar to the intercept in a linear regression model in that it represents the hazard for individuals whose covariate values are all 0 or at their reference level. The baseline hazard function does not depend on any parameters and drops out completely when estimating the parameters of the model. Thus, Cox regression output does not include an intercept.
How do we interpret the regression coefficients in Equation (7.3)? Recall that in logistic regression \(e^\beta\) represented an odds ratio (OR). In the Cox model, \(e^\beta\) represents a hazard ratio (HR) comparing the hazard of experiencing the event at time \(t\) between individuals with \(X = x + 1\) vs. those with \(X = x\) (for a continuous predictor) or between individuals at a specific level of \(X\) vs. those at its reference level (for a categorical predictor), holding all other predictors fixed.
To see this, consider the hazards at \(X_K = x + 1\) and \(X_K = x\) (holding other predictors fixed):
\[\begin{array}{lcl} h(t \vert X_1 = x_1, ..., X_K = x_K + 1) & = & h_0(t) e^{\beta_1x_1 + \ldots + \beta_K (x_K + 1)} \\ h(t \vert X_1 = x_1, ..., X_K = x_K) & = & h_0(t) e^{\beta_1x_1 + \ldots + \beta_K x_K} \end{array}\]
Taking the ratio of these, everything cancels out except \(e^{\beta_K}\) which is, therefore, the HR for \(X_K\). Importantly, the HR does not depend on time. This is the proportional hazards assumption – that the hazard functions for any two individuals have a constant proportion over time.
Interpret HRs similarly to ORs estimated from logistic regression (see Interpreting an OR in Section 6.4, substituting “HR” for “OR” and “hazard” for “odds”). For example, HR = 1.20 implies that one group has a 20% greater hazard of the event than another.