In all the examples we have looked at, the predictors did not vary over time. Each individual had only one value for each predictor. Since survival data is the result of follow-up over time, it is possible to have predictors that do vary over time. For example, in a study of time to heart attack, researchers could record various other conditions’ occurrence over time, such as hypertension or angina. In a study of juvenile recidivism, researchers could record how education or employment status change over time. Cox regression is able to handle such time-varying predictors (also known as “time dependent covariates”).
The datasets we have analyzed so far contained one row per individual, including their event time, an indicator of whether the event occurred at that time or the time is censored, and time-invariant predictor values. A dataset with time-varying predictors will have multiple rows per individual, with different rows having different values for the time-varying predictors, reflecting how they change over time. Additionally, rather than having a single event time variable, each row will have two time variables indicating the beginning and end of the time interval represented by that row of data.
Example 7.8: The Opioid teaching dataset (see Appendix A.7) contains longitudinal information for 362 individuals who at baseline had used non-prescribed pharmaceutical opioids (NPPO, “pain pills”), but were not dependent on NPPOs and had never used heroin (Carlson et al. 2016). The dataset contains 1853 observations. Each row contains the time variables
STOP which define the time interval (years from initiation of NPPO use) associated with that row. Time-invariant variables in the dataset are constant over all rows for the same individual, while time-varying variables can change between rows. Each
(START, STOP] interval defines a period of time during which no variables changed. Two time-varying variables in the dataset are heroin use (
heroin) (the event indicator variable) and lifetime opioid dependence (
dep_lifetime) based on DSM-IV criteria (Forman et al. 2004; Hudziak et al. 1993). Other than heroin use, which is the event of interest, time-dependent variables were lagged to ensure that they temporally preceded the outcome.
Look at the rows of data for a few variables for one individual.
load("Data/opioid_rmph.rData") %>% opioid filter(RANDID == 10) %>% select(RANDID, wave, START, STOP, heroin, age_at_init, sex, dep_lifetime)
## RANDID wave START STOP heroin age_at_init sex dep_lifetime ## 1 10 1 3.78 4.26 0 19 Male 0 ## 2 10 2 4.26 4.78 0 19 Male 1 ## 3 10 3 4.78 5.29 0 19 Male 1 ## 4 10 4 5.29 5.84 0 19 Male 1 ## 5 10 5 5.84 6.27 0 19 Male 1 ## 6 10 6 6.27 6.79 1 19 Male 1
This individual is male and started using NPPOs at age 19 years. These two variables are time-invariant and so have the same value in every row. His first interview took place 4.26 years after NPPO initiation and asked about his experiences and behaviors going back 6 months. Thus, the time interval for his first row is
(START, STOP] = (3.78, 4.26]. He first reported using heroin at his 6th interview. At baseline (wave = 0, but remember time-varying variables were lagged so wave = 1 values in the dataset actually are the values from wave = 0), he did not meet the criteria for lifetime opioid dependence, but he did at the next interview. Thus, the associated time-varying variable (
dep_lifetime) changes from the first to second row.
We will work with datasets that are already in this format. If, in your future work, you have a dataset in a different format, see Therneau T., Crowson C.S., and Atkinson E.J. (2023) for guidance on transforming it into the correct format. Given data set up in this manner, we can fit a Cox regression model with a slight modification to the
coxph() syntax we have been using. Instead of
Surv(TIME, EVENT), use
Surv(START, STOP, EVENT). In this example, the
STOP variables are named
STOP but if they have different names in a future dataset you are working with then use those names instead.
.8 <- coxph(Surv(START, STOP, heroin) ~ cox.ex7+ sex + dep_lifetime, age_at_init data = opioid) summary(cox.ex7.8)
## Call: ## coxph(formula = Surv(START, STOP, heroin) ~ age_at_init + sex + ## dep_lifetime, data = opioid) ## ## n= 1853, number of events= 27 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## age_at_init -0.393 0.675 0.135 -2.91 0.0036 ** ## sexMale 0.151 1.164 0.392 0.39 0.6996 ## dep_lifetime 1.058 2.881 0.399 2.65 0.0080 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## age_at_init 0.675 1.482 0.518 0.879 ## sexMale 1.164 0.859 0.539 2.511 ## dep_lifetime 2.881 0.347 1.318 6.298 ## ## Concordance= 0.716 (se = 0.042 ) ## Likelihood ratio test= 15.2 on 3 df, p=0.002 ## Wald test = 14.5 on 3 df, p=0.002 ## Score (logrank) test = 15.4 on 3 df, p=0.001
cbind("AHR" = exp(summary(cox.ex7.8)$coef[, "coef"]), exp(confint(cox.ex7.8)), "p-value" = summary(cox.ex7.8)$coef[, "Pr(>|z|)"])
## AHR 2.5 % 97.5 % p-value ## age_at_init 0.6749 0.5180 0.8794 0.003597 ## sexMale 1.1635 0.5391 2.5111 0.699560 ## dep_lifetime 2.8807 1.3176 6.2981 0.008022
::Anova(cox.ex7.8, type = 3, test.statistic = "Wald")car
## Analysis of Deviance Table (Type III tests) ## ## Response: Surv(START, STOP, heroin) ## Df Chisq Pr(>Chisq) ## age_at_init 1 8.48 0.0036 ** ## sex 1 0.15 0.6996 ## dep_lifetime 1 7.03 0.0080 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Whether a predictor is time-varying or time-invariant, its HR can be interpreted as a comparison of the hazard between groups of individuals with different values of that predictor. Thus, in this example, we could conclude that after adjusting for age at NPPO initiation and sex, those with lifetime opioid dependence have 2.88 times the hazard of using heroin as those who do not (AHR = 2.88; 95% CI = 1.32, 6.30; p = .008). The HR for a time-varying predictor, however, can also be interpreted as the effect of within-individual change on the hazard. If an individual without opioid dependence transitions to dependence, their hazard of transitioning to heroin is multiplied by 2.88.
- If an observation has the same start and end time,
coxphwill drop it from the analysis since there is no time at risk. However, this may not be appropriate. For example, suppose times are recorded as whole number days. An interval could start on a certain day and the event could have happened that same day. So there is time at risk, it is just less than a whole day. In this case, an approximate solution is to add a fractional number to the stop time for observations with
STOP, as suggested in Therneau T., Crowson C.S., and Atkinson E.J. (2023). For example, if
STOPare both day 10, change
STOPto 10.5 (making the assumption that the person was at risk in this interval for 12 hours). Make sure to also increase the
STARTtime of the next time interval for this person by the same amount to avoid having overlapping intervals.
- As mentioned previously, the event time distribution in this dataset is left-truncated – those who transitioned to heroin before the study started were not eligible for inclusion in the study. Fortunately, the
Surv(START, STOP, EVENT)syntax automatically handles the left-truncation when, as in this dataset, the start time for the first interval for each individual is the time from the time origin (e.g., initiation of pain pill use) to the start of the study. Having the start time of each individual’s first interval be 0 would lead to bias as it would assume that everyone was under observation since their time origin and that those in the study were sampled from those at risk since the time origin. In fact, participants were only observed after the study started and those who experienced the event before then were excluded.