13.1 Identification setting

The aim is to identify the causal or treatment effect of a particular regressor or treatment on an outcome. For example, this could involve estimating the causal effect of a labor training program on individuals’ earnings or unemployment, or the price elasticity of demand. The starting point is the potential outcomes framework (Donald B. Rubin 1974), which defines a counterfactual scenario describing what would happen to the outcome variable if the treatment status or level were different (commonly referred to as the Rubin causal model (Holland 1986)).

In this context, treatment status refers to a binary treatment case with two potential outcomes; for instance, what would my earnings be if I had participated in the training program? Conversely, treatment level applies to cases where there is a potential outcome for each level of the treatment; for example, what would demand be if the price had increased by 10%?

Following the potential outcomes notation in the binary treatment case, let \(D_i = 1\) and \(D_i = 0\) indicate the treatment status, corresponding to treated and control units, respectively. The potential outcomes \(Y_i(1)\) and \(Y_i(0)\) represent the outcome for unit \(i = 1, 2, \dots, N\) under treatment and control, respectively. For instance, \(Y_i(1)\) denotes the employment status an individual would have if she/he had participated in the training program, regardless of actual participation, whereas \(Y_i(0)\) denotes the employment status if the individual had not attended the program. The individual-level treatment effect is then defined as

\[\begin{align*} \tau_i = Y_i(1) - Y_i(0). \end{align*}\]

The potential outcomes framework can also be extended to situations where the treatment is continuous (Guido W. Imbens 2014). In this case, let \(Y_i(x_s)\) denote the outcome for unit \(i\) under the counterfactual scenario where the treatment variable \(X_s\) takes the value \(x_s\). The “treatment effect” of changing \(X_s\) from \(x_s\) to \(x_s'\), for example, the effect of increasing the price by 10% on demand, is given by

\[\begin{align*} \beta_{si} = Y_i(x_s') - Y_i(x_s). \end{align*}\]

When the change is infinitesimal, the causal effect can be expressed as a marginal effect:

\[\begin{align*} \beta_{si} = \frac{\partial Y_i(x_s)}{\partial x_s}. \end{align*}\]

This setting is more complex than the binary treatment case because there is a potential outcome for each possible value of \(x_s\) (Gill and Robins 2001). Therefore, we begin by considering the binary treatment case. However, the fundamental problem of causal inference (Holland 1986) remains: it is impossible to observe the same unit under different treatment statuses simultaneously. Consequently, we must learn about causal effects by comparing the outcomes of treated and untreated units.

Bayesian inference for causal effects (Donald B. Rubin 1978) is more direct than procedures based on Fisher’s p-value approach, which relies on the logic of stochastic proof by contradiction, or Neyman’s randomization-based inference, which is grounded in the idea of repeated sampling and the construction of confidence intervals for treatment effects (Donald B. Rubin 2004).

Bayesian inference for causal effects treats the potential outcomes as random variables and involves computing the posterior predictive distribution to evaluate treatments not received, conditional on the observed responses to treatments actually received. This approach yields the posterior distribution of the causal estimands as a function of both the observed outcomes and the unobserved potential outcomes, which are treated as missing data and handled through data augmentation methods (Tanner and Wong 1987).

In addition, it is important to note that the likelihood function does not contain information about the correlation between potential outcomes due to the fundamental problem of causal inference. Consequently, most of the classical literature on treatment effects has focused on average treatment effects, which require only the marginal distributions of the potential outcomes \(Y(1)\) and \(Y(0)\) (Heckman, Lopes, and Piatek 2014). Nevertheless, it is also possible to estimate quantile treatment effects (Abadie, Angrist, and Imbens 2002; Victor Chernozhukov and Hansen 2005).

There are also Bayesian proposals for performing inference on the correlation between potential outcomes (Gary Koop and Poirier 1997; Heckman, Lopes, and Piatek 2014). These approaches allow recovery of the joint distribution of the potential outcomes, thereby enabling inference beyond average treatment effects. However, this requires learning from the prior rather than the data, or imposing additional structure on the causal model.

A Bayesian framework further makes it possible to estimate distributional treatment effects with uncertainty quantification, as formally defined by Aakvik, Heckman, and Vytlacil (2005). See also Heckman, Lopes, and Piatek (2014) for related proposals. These effects also characterize the entire distribution of potential outcomes under treatment and control, rather than focusing solely on averages, thereby capturing heterogeneity. This is particularly relevant for policy, as it enables estimation of the proportion of the population that benefits from a social program or the probability that a treated individual experiences a positive effect (Heckman, Lopes, and Piatek 2014; Ramírez-Hassan and Guerra-Urzola 2021).

Furthermore, under a Bayesian framework, the impact of adding or relaxing identification assumptions can be assessed by examining how the distributional treatment effects change (Guido W. Imbens and Rubin 1997).

References

Aakvik, Arild, James J. Heckman, and Edward J. Vytlacil. 2005. “Estimating Treatment Effects for Discrete Outcomes When Responses to Treatment Vary: An Application to Norwegian Vocational Rehabilitation Programs.” Journal of Econometrics 125 (1-2): 15–51. https://doi.org/10.1016/j.jeconom.2004.04.003.
Abadie, Alberto, Joshua D. Angrist, and Guido W. Imbens. 2002. “Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings.” Econometrica 70 (1): 91–117. https://doi.org/10.1111/1468-0262.00271.
———. 2005. “An IV Model of Quantile Treatment Effects.” Econometrica 73 (1): 245–61. https://doi.org/10.1111/j.1468-0262.2005.00570.x.
Gill, Richard D, and James M Robins. 2001. “Causal Inference for Complex Longitudinal Data: The Continuous Case.” Annals of Statistics, 1785–1811.
Heckman, James J., Hedibert F. Lopes, and Rémi Piatek. 2014. “Treatment Effects: A Bayesian Perspective.” Econometric Reviews 33 (1-4): 36–67. https://doi.org/10.1080/07474938.2013.807103.
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945–60. https://doi.org/10.2307/2289064.
Imbens, Guido W. 2014. “Instrumental Variables: An Econometrician’s Perspective.” Statistical Science 29 (3): 323–58. https://doi.org/10.1214/14-STS480.
Imbens, Guido W, and Donald B Rubin. 1997. “Bayesian Inference for Causal Effects in Randomized Experiments with Noncompliance.” The Annals of Statistics, 305–27.
Koop, Gary, and Dale J Poirier. 1997. “Learning about the Across-Regime Correlation in Switching Regression Models.” Journal of Econometrics 78 (2): 217–27.
Ramírez-Hassan, A., and R. Guerra-Urzola. 2021. “Bayesian Treatment Effects Due to a Subsidized Health Program: The Case of Preventive Health Care Utilization in Medellín (Colombia).” Empirical Economics 60: 1477–1506.
Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66 (5): 688–701. https://doi.org/10.1037/h0037350.
———. 1978. “Bayesian Inference for Causal Effects: The Role of Randomization.” The Annals of Statistics 6 (1): 34–58. https://doi.org/10.1214/aos/1176344064.
———. 2004. “Teaching Statistical Inference for Causal Effects in Experiments and Observational Studies.” Journal of Educational and Behavioral Statistics 29 (3): 343–67. https://doi.org/10.3102/10769986029003343.
Tanner, M. A., and W. H. Wong. 1987. “The Calculation of Posterior Distributions by Data Augmentation.” Journal of the American Statistical Association 82 (398): 528–40.