7.1 Introduction
Survival analysis is a commonly used term for the analysis of an outcome that is a time to an event. The name refers to its use to study time to death, but the event can be anything. For example, consider the following examples:
- Time to preterm birth: In a prospective study of the effect of maternal periodontal disease on preterm birth, researchers randomized pregnant women to either receive a treatment intervention or be in a control group and then recorded gestational age at the end of the pregnancy (Michalowicz et al. 2006). The event is preterm birth and the time is gestational age.
- Time to myocardial infarction (MI): The Framingham Heart Study is a longitudinal study that began in 1948 in Framingham, Massachusetts, with the goal of elucidating the etiology of cardiovascular disease (Dawber, Meadors, and Moore 1951; Mahmood et al. 2014). Among the numerous outcomes recorded were hospitalization for MI (“heart attack”). For this outcome, the event is hospitalized MI, and the time is years from the baseline study examination to hospitalized MI.
- Transition to heroin: In a prospective natural history study of users of non-prescribed pharmaceutical opioids (“pain pills”) who had never used heroin, researchers followed individuals every 6 months for 3 years after baseline to determine what factors influenced the time from first illicit pain pill use to first use of heroin (Carlson et al. 2016). The event is first use of heroin, and the time is years from initiation of illicit use of pain pills to first use of heroin.
- Mental health / substance use referrals and juvenile recidivism: Researchers used retrospective data to compare time from release to re-offense (recidivism) among first-time juvenile offenders between those who did and did not receive a mental health or substance use referral (Zeola, Guina, and Nahhas 2017). The event is recidivism, and the time is days from release to re-offense.
In each of these examples, the outcome is the time to an event. A complication is that not all individuals experience the event before they are lost to follow-up, before they experience a competing event that removes the possibility of experiencing the event of interest, or before the study ends. We consider such an individual’s event time to be censored; if they were to ever experience the event, their time-to-event would be larger than the time at which we last observed them. In other words, we do not know their hypothetical event time, only that it is larger than some number. Survival analysis is designed to handle censored data.
Table 7.1 describes the time-to-event outcome and censoring for the studies listed above:
Study | Time Origin | Event | Time-to-Event | Censoring |
---|---|---|---|---|
Time to preterm birth | First day of last menstrual cycle | Preterm birth (live or non-live) (gestational age < 37 weeks) | T = Weeks from first day of last menstrual cycle to preterm birth | Termination of pregnancy or loss to follow-up prior to 37 weeks: T > gestational age at that time; Gestational age at least 37 weeks: T > 37 |
Time to MI | Baseline examination | MI | T = Years from baseline to MI | Loss to follow-up, end of follow-up, or death prior to MI: T > time at last follow-up or death |
Transition to heroin | First illicit use of prescribed pain pills | First use of heroin | T = Years to transition to heroin | Loss to follow-up prior to heroin use or no heroin use at final (36m) interview: T > time at last follow-up |
Juvenile recidivism | Release from first offense | Next offense | T = Days to recidivism | No repeat offense before aging out of system, loss to follow-up, or study conclusion: T > time at last follow-up |
In the time to MI example, an individual who dies prior to experiencing MI has an event time that is censored at the time of death but they could have experienced the event later had they not died. Censoring in the preterm birth example, however, is conceptually different. In that example, a pregnancy that reaches full term (37 weeks) has an event time that is censored at 37 weeks but it could not have resulted in a preterm birth later because a birth at 37 weeks or later is not preterm, by definition. So, in some sense, their time to preterm birth is infinite. For the methods we will be learning in this chapter (Kaplan-Meier estimate of survival, Cox regression), the results will be identical whether we consider these event times censored at 37 weeks, 40 weeks, or 400 weeks. Despite the awkwardness in the definition of censoring, survival analysis allows us to compare not only the occurrence of preterm birth (as would be the case if we used logistic regression) but also the timing.
NOTE: For three of the four examples above, individuals were under observation from their time origin on. For the transition to heroin example, however, the time origin was prior to the start of observation. The study included only those who had begun using pain pills but not yet transitioned to heroin. Therefore, individuals who otherwise would have been sampled and included in the study but had already transitioned to heroin were excluded. This is an example of left truncation – the distribution of observed event times is truncated on the left, excluding those whose time origin is prior to the beginning of observation. Those individuals have, on average, shorter times to event. Fortunately, the method used for handling time-varying predictors in Section 7.14 also handles left truncation.
Comparison to linear and logistic regression
Why not use linear regression? Event times are numeric variables, after all. However, even if the assumptions of a linear regression model were met, standard linear regression cannot handle censoring.
Why not use logistic regression? Events are binary variables, after all – the event either happened or not within the follow-up period. While this approach would be able to handle censoring due to follow-up ending, it would not correctly handle censoring due to earlier loss to follow-up. Also, logistic regression fails to account for when the event occurred. For example, in our preterm birth example, treating the outcome as binary (preterm birth vs. not preterm birth) would allow us to compare the odds of preterm birth between groups (with those lost to follow-up treated as missing data). But suppose that in one group not only were there more preterm births but they also tended to occur earlier. Logistic regression, which ignores event timing, would underestimate the risk associated with being in this group. Survival analysis, however, would estimate the risk based on both the occurrence and timing of preterm births.
In summary, unlike either logistic or linear regression, survival analysis handles both the “if” and “when” of events. It also handles the fact that the “if” may only be known to be “not yet”, resulting in a censored “when”, because individuals may exit a study for a reason other than the event prior to the end of the study, and because observation time is finite.