13.9 General Bayes posteriors
It is well known that, under model misspecification, that is, when the assumed likelihood function does not coincide with the true data-generating likelihood, the posterior concentrates its mass near those points in the support of the prior that minimize the Kullback–Leibler divergence with respect to the true model (Kleijn and Vaart 2006). However, this updating process may yield credible sets that fail to achieve the desired coverage, Bayes factors that can be misleading, and predictions that may remain approximately valid but require careful checking. In addition, the misspecified posterior distribution may exhibit suboptimal risk performance. In this setting, a modified posterior directly linked to the risk function of interest, known as the Gibbs posterior, can still perform very well (W. Jiang and Tanner 2008),
\[\begin{equation} \hat{\pi}(\boldsymbol{\theta}\mid\mathbf{y}) =\frac{\exp\left\{-Nw\,R_N(\boldsymbol{\theta},\mathbf{y})\right\}\pi(\boldsymbol{\theta})} {\int_{\boldsymbol{\Theta}}\exp\left\{-Nw\,R_N(\boldsymbol{\theta},\mathbf{y})\right\}\pi(\boldsymbol{\theta})\,d\boldsymbol{\theta}}, \tag{13.4} \end{equation}\]
where \(w>0\) is the learning rate (or the inverse temperature in simulated annealing), which balances the information in the data with that in the prior, \(N\) is the sample size, and
\[ R_N(\boldsymbol{\theta},\mathbf{y})=\frac{1}{N}\sum_{i=1}^N l(\boldsymbol{\theta},\mathbf{y}_i) \]
is the empirical risk associated with the loss function \(l(\boldsymbol{\theta},\mathbf{y}_i)\).
Note that Equation (13.4) reduces to the ordinary Bayesian posterior when the loss is chosen as the negative log-likelihood, \(l(\boldsymbol{\theta},y_i)=-\log p(y_i\mid\boldsymbol{\theta})\), under i.i.d. sampling and with \(w=1\). In this case, we are asserting knowledge of the data-generating process \(p(y_i\mid \boldsymbol{\theta})\). More generally, however, \(R_N(\boldsymbol{\theta},\mathbf{y})\) can be based on any loss function that satisfies the required conditions for coherent belief updating: non-negativity, existence of a well-defined expectation, identifiability, and additivity across observations. Importantly, correctness of the parametric model is not required, since
\[ \frac{1}{N}\sum_{i=1}^N l(\boldsymbol{\theta},\mathbf{y}_i)\;\stackrel{p}{\longrightarrow}\;\int_{\mathcal{Y}} l(\boldsymbol{\theta},\mathbf{y})\,p(\mathbf{y}\mid \boldsymbol{\theta})\,d\mathbf{y}, \]
by the law of large numbers as \(N\to\infty\).
Thus, the Gibbs posterior provides inference on the parameter values that minimize the chosen risk function, with minimal modeling assumptions. In contrast, the standard Bayesian posterior allows inference on essentially any feature of the data-generating distribution, but at the cost of strong model assumptions (Syring and Martin 2019).
Bissiri, Holmes, and Walker (2016) show that Equation (13.4) is a valid, coherent mechanism to update prior beliefs. In particular, there must exist a mapping \(h\) such that
\[ \pi(\boldsymbol{\theta}\mid \mathbf{y}) = h\!\big(l(\boldsymbol{\theta},\mathbf{y}),\pi(\boldsymbol{\theta})\big), \]
where \(h\) satisfies the coherence property
\[ h\!\left[l(\boldsymbol{\theta},y_2),\,h\!\big(l(\boldsymbol{\theta},y_1),\pi(\boldsymbol{\theta})\big)\right] = h\!\big(l(\boldsymbol{\theta},y_2)+l(\boldsymbol{\theta},y_1),\pi(\boldsymbol{\theta})\big). \]
This ensures that the updated posterior \(\pi(\boldsymbol{\theta}\mid y_1,y_2)\) is the same whether we update with \((y_1,y_2)\) jointly or sequentially.
Equation (13.4) is the solution of minimizing the loss function \(L(\nu;\pi,\mathbf{y})\) on the space of probability measures on \(\theta\)-space,
\[ \hat{\pi}=\arg\min_{\nu} L(\nu;\pi,\mathbf{y}), \]
such that \(\hat{\pi}\) is the representation of beliefs about \(\boldsymbol{\theta}\) given prior beliefs (\(\pi\)) and data (\(\mathbf{y}\)). Given that the prior beliefs and the data are two independent pieces of information, it makes sense that the loss function is additive in these two arguments,
\[ L(\nu;\pi,\mathbf{y})=h_1(\nu,\mathbf{y})+w^{-1}h_2(\nu,\pi), \]
where \(h_1(\cdot)\) and \(h_2(\cdot)\) are loss functions in their arguments.
The coherence requirement implies that
\[ h_2(\nu,\pi)=d_{KL}(\nu,\pi)=\int \log\frac{\nu(d\boldsymbol{\theta})}{\pi(d\boldsymbol{\theta})} \nu(d\boldsymbol{\theta}), \]
that is, \(h_2(\nu,\pi)\) must be the Kullback-Leibler divergence (Bissiri, Holmes, and Walker 2016).
On the other hand,
\[ h_1(\nu,\mathbf{y})=\int l(\boldsymbol{\theta},\mathbf{y})\nu(d\boldsymbol{\theta}) \]
is the expected loss of the action with respect to the data.
Therefore, the minimizer of
\[ L(\nu;\pi,\mathbf{y})=\int l(\boldsymbol{\theta},\mathbf{y})\nu(d\boldsymbol{\theta})+d_{KL}(\nu,\pi), \]
is given by Equation (13.4) (T. Zhang 2006; W. Jiang and Tanner 2008; Bissiri, Holmes, and Walker 2016).
A very important parameter in the Gibbs posterior (W. Jiang and Tanner 2008), or general Bayes posterior (Bissiri, Holmes, and Walker 2016), is the learning rate, as it determines the asymptotic sampling properties of \(\hat{\pi}(\boldsymbol{\theta}\mid\mathbf{y})\) used to perform inference on \(\boldsymbol{\theta}\). For instance, Bissiri, Holmes, and Walker (2016) propose different strategies to set this parameter, and Syring and Martin (2019) propose a Monte Carlo algorithm that selects the learning rate so that the resulting credible region attains the nominal Frequentist coverage probability.
Victor Chernozhukov and Hong (2003) introduce the Laplace-type estimator (LTE) or quasi-posterior distribution, which can be interpreted as a special case of the general Bayes update of Bissiri, Holmes, and Walker (2016), taking as loss a scaled sample criterion based on the moment conditions (e.g., the GMM quadratic form).
In particular, given the moment conditions
\[
\mathbb{E}\!\big[\mathbf{g}_i(\mathbf{w}_i,\boldsymbol{\theta})\big]=\mathbf{0}_{d}
\quad \text{if and only if } \boldsymbol{\theta}=\boldsymbol{\theta}_0,
\]
where the expectation is taken with respect to the population distribution,
\(\mathbf{w}_{1:N}:=[\mathbf{w}_1 \ \mathbf{w}_2 \ \dots \ \mathbf{w}_N]\) is a random sample from \(\mathbf{W}\subset \mathbb{R}^{d_w}\),
\(\mathbf{g}:\mathbb{R}^{d_w}\times\boldsymbol{\Theta}\to\mathbb{R}^{d}\) is a vector of known functions, and
\(\boldsymbol{\theta}=[\theta_{1}\ \theta_{2}\ \dots\ \theta_{p}]^{\top}\in\boldsymbol{\Theta}\subset\mathbb{R}^{p}\) with \(d\geq p\), the risk function can be defined as
\[
R_N(\boldsymbol{\theta},\mathbf{w})=\tfrac{1}{2}\left(\underbrace{\frac{1}{N}\sum_{i=1}^N \mathbf{g}_i(\mathbf{w}_i,\boldsymbol{\theta})}_{\mathbf{g}_N(\boldsymbol{\theta})}\right)^{\top}\mathbf{W}_N\left(\underbrace{\frac{1}{N}\sum_{i=1}^N\mathbf{g}_i(\mathbf{w}_i,\boldsymbol{\theta})}_{\mathbf{g}_N(\boldsymbol{\theta})}\right)
\]
where \(\mathbf{W}_N\) is a positive semi-definite weighting matrix such that \[ \mathbf{W}_N \;\to\; \Bigg(\text{Var}\left[\sqrt{N}\left(\tfrac{1}{N}\sum_{i=1}^N \mathbf{g}_i(\mathbf{w}_i,\boldsymbol{\theta}_0)\right)\right]\Bigg)^{-1} \quad \text{as } N\rightarrow \infty. \]
Then, the quasi-posterior in Victor Chernozhukov and Hong (2003) is similar to (13.4) with \(w=1\).
The following algorithm shows the Metropolis–Hastings to perform inference using the general Bayes posterior.
Algorithm: General Bayes Posterior — Metropolis–Hastings
Let \(R_{N}(\theta, y)\) denote the empirical risk and \(w > 0\) the learning rate.
Initialization:
Choose an initial value
\[ \theta^{(0)} \in \text{supp}\!\left\{ \widehat{\pi}(\theta \mid y) \right\}. \]Metropolis–Hastings iterations:
For \(s = 1,2,\ldots,S\):Proposal:
Draw a candidate
\[ \theta^{c} \sim q\!\left(\theta \mid \theta^{(s-1)}\right). \]Acceptance probability:
\[ \alpha\!\left(\theta^{(s-1)}, \theta^{c}\right) = \min\!\left\{ 1,\, \frac{ q\!\left(\theta^{(s-1)} \mid \theta^{c}\right)\; \exp\!\left[-N w\, R_{N}(\theta^{c}, y)\right]\; \pi\!\left(\theta^{c}\right) }{ q\!\left(\theta^{c} \mid \theta^{(s-1)}\right)\; \exp\!\left[-N w\, R_{N}(\theta^{(s-1)}, y)\right]\; \pi\!\left(\theta^{(s-1)}\right) } \right\}. \]Accept–reject step:
Draw \(U \sim \mathrm{Uniform}(0,1)\), and set \[ \theta^{(s)} = \begin{cases} \theta^{c}, & \text{if } U \le \alpha\!\left(\theta^{(s-1)}, \theta^{c}\right), \\[6pt] \theta^{(s-1)}, & \text{otherwise}. \end{cases} \]
End for.
Under suitable regularity conditions, Victor Chernozhukov and Hong (2003) show that the LTE posterior mean is first-order equivalent to the efficient GMM estimator and that posterior quantiles yield confidence sets with asymptotically correct Frequentist coverage. Relatedly, U. K. Müller (2013) demonstrates that, under model misspecification, replacing the original likelihood with a curvature-adjusted (“sandwich”) log-likelihood can improve Frequentist risk, that is, lower the average loss over repeated samples when making inference on the parameters. Moreover, Chen, Christensen, and Tamer (2018) develop confidence sets based on quasi-posteriors that achieve exact asymptotic Frequentist coverage for identified sets of parameters in complex nonlinear structural models, regardless of whether the parameters are point identified. Finally, Andrews and Mikusheva (2022) study decision rules for weak GMM and provide support for quasi-Bayesian procedures built from the GMM quadratic form, in line with the spirit of the LTE in settings with weak identification.
Example: Instrumental variable quantile regression (IVQR)
Victor Chernozhukov and Hansen (2005) propose an instrumental variable model of quantile treatment effects to characterize the heterogeneous impact of treatments across different points of the outcome distribution. Their model requires conditions that restrict the evolution of ranks across treatment states. Using these conditions, they address the endogeneity problem and recover quantile treatment effects using instrumental variables for the entire population, not only for compliers.
Assume a binary treatment variable \(D_i\in\{0,1\}\), with potential outcomes \(\{Y_i(0),Y_i(1)\}\). The estimand of interest is the quantile treatment effect, which summarizes the differences in the quantiles of potential outcomes across treatment states,
\[ q(D_i=1,\mathbf{X}_i=\mathbf{x}_i,\tau)-q(D=0,\mathbf{X}_i=\mathbf{x}_i,\tau), \]
where \(q(D_i=d,\mathbf{X}_i=\mathbf{x}_i,\tau)\) denotes the \(\tau\)-th quantile treatment response function.
The potential outcomes are related to the quantile treatment response via
\[ Y_i(d)=q(D_i=d_i,\mathbf{X}_i=\mathbf{x}_i,U(d_i)), \]
where \(U(d_i)\sim U(0,1)\) is the rank variable. This variable captures unobserved heterogeneity that explains differences in outcomes given observed characteristics \(\mathbf{x}_i\) and treatment \(d_i\).
For instance, consider a retirement savings model, where the potential outcome is net financial assets under different retirement plan statuses \(d\), and \(q(D=d,\mathbf{X}=\mathbf{x},\tau)\) is the net financial asset function describing how an individual with retirement status \(d\) and “financial ability” \(\tau\) is rewarded in the financial market. Because the function depends on \(\tau\), treatment effects are heterogeneous.
The identification conditions for the IVQR model are stated in Victor Chernozhukov and Hansen (2005) as follows:
1. Potential outcomes:
\[
Y_i(d)=q(D_i=d_i,\mathbf{X}_i=\mathbf{x}_i,U(d_i)),
\]
where \(q(D_i=d_i,\mathbf{X}_i=\mathbf{x}_i,\tau)\) is strictly increasing in \(\tau\) and \(U(d_i)\sim U(0,1)\).
2. Independence: Conditional on \(\mathbf{X}_i=\mathbf{x}_i\), the rank variables \(\{U(d_i)\}\) are independent of the instruments \(\mathbf{Z}_i\).
3. Selection: The treatment assignment is given by \(D_i=\delta(\mathbf{Z}_i,\mathbf{X}_i,\mathbf{V}_i)\) for some unknown function \(\delta\) and unobserved heterogeneity \(\mathbf{V}_i\).
4. Rank invariance: Conditional on \(\mathbf{X}_i=\mathbf{x}_i,\mathbf{Z}_i=\mathbf{z}_i\),
- either \(\{U(d_i)\}\) coincide (\(U(d_i)=U\) for all \(d\)), or
- \(\{U(d_i)\}\) are identically distributed conditional on \(\mathbf{V}_i\).
5. Observables: The observed data consist of \(Y_i=q(D_i,\mathbf{X}_i,U(D_i))\), \(D_i\), \(\mathbf{X}_i\), and \(\mathbf{Z}_i\), \(i=1,2,\dots,N\).
The most important identification restriction is rank invariance, which implies that individuals with higher unobserved rank remain higher-ranked regardless of treatment status. This condition accommodates more general selection mechanisms than the monotonicity assumption used in the LATE framework, while being less restrictive than full independence assumptions between instruments (\(\mathbf{Z}\)) and unobserved variables in the selection equation (\(\mathbf{V}\)) in other common models (see Victor Chernozhukov and Hansen (2004); Victor Chernozhukov and Hansen (2005) for details).79
The main testable implication of the identification restrictions is that
\[ P\!\left(Y_i \leq q(D_i,\mathbf{X}_i,\tau)\mid \mathbf{X}_i,\mathbf{Z}_i\right) = P\!\left(Y_i < q(D_i,\mathbf{X}_i,\tau)\mid \mathbf{X}_i,\mathbf{Z}_i\right) = \tau, \]
for all \(\tau\) almost surely, and \(U(D_i)\sim U(0,1)\) conditional on \((\mathbf{X}_i,\mathbf{Z}_i)\).
This conditional moment restriction implies the unconditional moment conditions that form the basis for estimation and inference in the IVQR model (Victor Chernozhukov and Hong 2003; Victor Chernozhukov and Hansen 2004),
\[ \mathbf{g}_N(\boldsymbol{\theta})=\frac{1}{N}\sum_{i=1}^N (\tau-\mathbf{1}(Y_i\leq \alpha_{\tau}D_i+\mathbf{X}_i^{\top}\boldsymbol{\beta}_{\tau})) \begin{bmatrix} \mathbf{X}_i\\ \mathbf{Z}_i \end{bmatrix}. \]
Thus, the empirical risk function is given by
\[ R_N(\boldsymbol{\theta},\mathbf{w})=\tfrac{1}{2}\mathbf{g}_N(\boldsymbol{\theta})^{\top}\mathbf{W}_N\mathbf{g}_N(\boldsymbol{\theta}), \]
where
\[ \mathbf{W}_N=\frac{1}{\tau(1-\tau)}\left(\frac{1}{N}\sum_{i=1}^N \mathbf{Z}_i\mathbf{Z}_i^{\top}\right)^{-1}. \]
The following code implements the previous algorithm to perform inference on the quantile treatment effect of participation in the 401(k) retirement program on net financial assets, using eligibility as an instrument (see the 401(k) treatment effects example for details of the data).
rm(list = ls()); set.seed(10101)
# Load data
df <- read.csv("https://raw.githubusercontent.com/BEsmarter-consultancy/BSTApp/refs/heads/master/DataApp/401k.csv",
sep = ",", header = TRUE, quote = "")
# Attach variables
attach(df)## The following objects are masked from mydata (pos = 9):
##
## a1, a2, a3, a4, a401, a5, age, col, db, dum91, e401, ecat, educ,
## fsize, hequity, hmort, hown, hs, hval, i1, i2, i3, i4, i5, i6, i7,
## icat, inc, ira, male, marr, net_n401, net_nifa, net_tfa, nifa,
## nohs, p401, pira, smcol, tfa, tfa_he, tw, twoearn, X, zhat
## The following objects are masked from mydata (pos = 11):
##
## a1, a2, a3, a4, a401, a5, age, col, db, dum91, e401, ecat, educ,
## fsize, hequity, hmort, hown, hs, hval, i1, i2, i3, i4, i5, i6, i7,
## icat, inc, ira, male, marr, net_n401, net_nifa, net_tfa, nifa,
## nohs, p401, pira, smcol, tfa, tfa_he, tw, twoearn, X, zhat
## The following object is masked from DataIntRate (pos = 37):
##
## i3
## The following object is masked from DataIntRate (pos = 39):
##
## i3
y <- net_tfa/1000 # Outcome: net financial assets
x <- as.vector(p401) # Endogenous regressor: participation
w <- as.matrix(cbind(age, inc, fsize, educ, marr, twoearn, db, pira, hown)) # Exogenous regressors
z <- as.matrix(e401) # Instrument: eligibility (NO intercept here)
library(quantreg)## Loading required package: SparseM
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:Matrix':
##
## det
##
## Attaching package: 'quantreg'
## The following object is masked from 'package:survival':
##
## untangle.specials
tau <- 0.5
QuanReg <- rq(y ~ p401 + age + inc + fsize + educ + marr + twoearn + db + pira + hown, tau = tau, data = df)
summary(QuanReg)## Warning in summary.rq(QuanReg): 158 non-positive fis
##
## Call: rq(formula = y ~ p401 + age + inc + fsize + educ + marr + twoearn +
## db + pira + hown, tau = tau, data = df)
##
## tau: [1] 0.5
##
## Coefficients:
## Value Std. Error t value Pr(>|t|)
## (Intercept) -5.23934 0.35920 -14.58602 0.00000
## p401 6.83910 0.46322 14.76430 0.00000
## age 0.10206 0.00570 17.90513 0.00000
## inc 0.00019 0.00001 25.06906 0.00000
## fsize -0.25534 0.03211 -7.95300 0.00000
## educ -0.13458 0.02077 -6.48096 0.00000
## marr 0.14795 0.14019 1.05538 0.29128
## twoearn -3.45178 0.17043 -20.25355 0.00000
## db -0.64996 0.14442 -4.50054 0.00001
## pira 21.81051 0.88676 24.59576 0.00000
## hown 0.04201 0.10805 0.38877 0.69746
Reg <- MCMCpack::MCMCquantreg(y ~ x + w, data = df, tau = tau)
BayesExo <- summary(Reg)
LossFunct <- function(par, tau, y, z, x, w){
n <- length(y)
X <- cbind(1, x, w)
Z <- cbind(1, z, w)
Ind <- as.numeric(y <= X%*%par)
gn <- colMeans((tau - Ind) * Z)
Wni <- lapply(1:n, function(i) {Z[i,] %*% t(Z[i,])})
Wn <- 1 / (tau * (1 - tau)) * solve(Reduce("+", Wni)/n)
Ln <- - 0.5 * n * t(gn) %*% Wn %*% gn
return(Ln)
}
par0 <- colMeans(Reg)
LossFunct(par = par0, tau = tau, y = y, z = z, x = x, w = w)## [,1]
## [1,] -2.469711
# ----- MH using Ln -----
k <- length(par0); b0 <- rep(0, k); B0 <- 1000*diag(k)
S <- 10000; burnin <- 10000; thin <- 1; tot <- S + burnin
BETA <- matrix(NA, tot, k); accept <- logical(tot)
SIGMA <- diag(BayesExo[["statistics"]][,2]); tune <- 2.4 / sqrt(k)
BETA[1,] <- par0
LL <- LossFunct(par = BETA[1,], tau = tau, y = y, z = z, x = x, w = w)
pb <- txtProgressBar(min=0, max=tot, style=3)## | | | 0%
for(s in 2:tot){
cand <- BETA[s-1,] + MASS::mvrnorm(1, rep(0, k), tune*SIGMA)
LLc <- LossFunct(par = cand, tau = tau, y = y, z = z, x = x, w = w)
priorRat <- mvtnorm::dmvnorm(cand, b0, B0, log=TRUE) -
mvtnorm::dmvnorm(BETA[s-1,], b0, B0, log=TRUE)
loga <- (LLc - LL) + priorRat
if (is.finite(loga) && log(runif(1)) <= loga) {
BETA[s,] <- cand; LL <- LLc; accept[s] <- TRUE
} else {
BETA[s,] <- BETA[s-1,]; accept[s] <- FALSE
}
if (s <= burnin && s %% 100 == 0) {
acc <- mean(accept[(s-99):s])
if (acc > 0.35) tune <- tune * 1.25
if (acc < 0.15) tune <- tune / 1.25
}
setTxtProgressBar(pb, s)
}## | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |== | 4% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 12% | |========= | 13% | |========= | 14% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 19% | |============== | 20% | |============== | 21% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================ | 24% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 29% | |===================== | 30% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 32% | |======================= | 33% | |======================= | 34% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================ | 41% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |============================== | 44% | |=============================== | 44% | |=============================== | 45% | |================================ | 45% | |================================ | 46% | |================================= | 46% | |================================= | 47% | |================================= | 48% | |================================== | 48% | |================================== | 49% | |=================================== | 49% | |=================================== | 50% | |=================================== | 51% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |===================================== | 54% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |======================================== | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================ | 64% | |============================================= | 64% | |============================================= | 65% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 68% | |================================================ | 69% | |================================================= | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |=================================================== | 74% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |========================================================== | 84% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================= | 94% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 96% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100%
##
## Acceptance rate: 0.20655
##
## Acceptance rate: 0.2675732
post <- BETA[keep, , drop=FALSE] # posterior draws in scaled space
colnames(post) <- c("Int", "p401", "age", "inc", "fsize", "educ", "marr", "twoearn", "db", "pira", "hown")
# Posterior summaries
post_mean <- colMeans(post)
post_ci <- apply(post, 2, quantile, c(0.025, 0.975))
round(post_mean, 3); round(post_ci, 3)## Int p401 age inc fsize educ marr twoearn db pira
## -5.562 6.719 0.113 0.000 -0.271 -0.156 0.244 -3.282 -0.794 21.733
## hown
## -0.049
## Int p401 age inc fsize educ marr twoearn db pira hown
## 2.5% -5.630 6.612 0.103 0 -0.310 -0.186 0.172 -3.338 -0.86 21.692 -0.111
## 97.5% -5.468 6.806 0.122 0 -0.227 -0.128 0.331 -3.235 -0.74 21.808 0.009
Setting \(\tau = 0.5\) (the median), the quantile treatment effect is USD 6,719, with a 95% credible interval of (USD 6,612, USD 6,806). For \(\tau = 0.9\), the treatment effect is USD 19,318, with a 95% credible interval of (USD 18,634, USD 20,085). Note that, in the instrumental variable example, the posterior mean of the LATE was USD 8,520, which is higher than the treatment effect at the median (\(\tau = 0.5\)). This suggests that the LATE overstates the causal effect of 401(k) participation due to the large impact at higher quantiles of the outcome distribution.