A.7 Opioid


The Opioid dataset contains deidentified data from a study supported by The National Institute on Drug Abuse (NIDA), Grant number R01DA023577. More details follow in the “Documentation” section.


The Opioid dataset contains longitudinal information for 362 individuals who at baseline were age 18 to 23 years, had used non-prescribed pharmaceutical opioids (PO, “pain pills”), but were not dependent on POs and had never used heroin (Carlson et al. 2016). They were interviewed approximately every six months for three years (a total of seven waves = 0, 1, …, 7). The dataset contains 1853 observations.

Each row contains the subject and wave identifiers RANDID and wave and the time variables START and STOP, which define the time interval (years from initiation of PO use) associated with that row. Time-invariant variables in this dataset include age at initiation of PO use (age_at_init) and sex – these variables are the same in all rows for the same individual.

Time-varying variables can change between rows and the value in any given row represents that variable’s value in the time interval (START, STOP] (the interval runs from immediately after START up to and including STOP). The time-varying variables in the dataset are heroin use (heroin), lifetime opioid dependence (dep_lifetime) based on DSM-IV criteria (Forman et al. 2004; Hudziak et al. 1993), typically taking POs non-orally in the past 6 months (e.g., snorting, injecting) (po_nonoral), ever used POs to self-medicate a health problem in the past 6 months (self_medicate), lifetime psychiatric comorbidity (antisocial personality disorder, depression, generalized anxiety disorder, mania, or post-traumatic stress disorder) (psych_lifetime) and lifetime cocaine use (coca_lifetime).

Interviews asked individuals to report about their experiences and behavior since the last interview (about 6 months) so the temporal ordering of any changes in time-dependent variables since the previous interview are not known. Therefore, other than heroin use, which is the event of interest, all time-dependent predictors were lagged to ensure that they temporally preceded the outcome. While there were up to seven interviews per individual, the lagged dataset has only up to six. The first row in the data for each individual contains their outcome value at their first follow-up visit and their baseline values of the other time-varying variables. Each subsequent row contains an outcome value at a given interview along with the values of time-varying variables at the previous interview.

Teaching Dataset

Any analyses, interpretations, or conclusions reached herein are are only for the purpose of illustrating regression methods and are credited to the author, not to NIDA. The author makes no claim or implication that any inferences derived from this teaching dataset are valid estimates.

Creating the Teaching Dataset

To create the teaching dataset, do the following.

  • Download opioid_rmph.rData from RMPH Resources.
  • Place this .Rdata file in your “Data” folder.

Rows and columns

This file has the following numbers of rows and columns:

## [1] 1853   12


Carlson, Robert G., Ramzi W. Nahhas, Silvia S. Martins, and Raminta Daniulaityte. 2016. “Predictors of Transition to Heroin Use Among Initially Non-Opioid Dependent Illicit Pharmaceutical Opioid Users: A Natural History Study.” Drug and Alcohol Dependence 160: 127–34. https://doi.org/10.1016/j.drugalcdep.2015.12.026.
Forman, Robert F, Dace Svikis, Ivan D Montoya, and Jack Blaine. 2004. “Selection of a Substance Use Disorder Diagnostic Instrument by the National Drug Abuse Treatment Clinical Trials Network.” Journal of Substance Abuse Treatment 27 (1): 1–8. https://doi.org/10.1016/j.jsat.2004.03.012.
Hudziak, James J., John E. Helzer, Martin W. Wetzel, Keith B. Kessel, Barbara McGee, Aleksandar Janca, and Thomas Przybeck. 1993. “The Use of the DSM-III-R Checklist for Initial Diagnostic Assessments.” Comprehensive Psychiatry 34 (6): 375–83. https://doi.org/10.1016/0010-440X(93)90061-8.