13.10 Doubly robust Bayesian inferential framework (DRB)

Breunig, Liu, and Yu (2025) propose a doubly robust Bayesian (DRB) approach to perform inference on the average treatment effect (ATE) under unconfoundedness. They adjust the prior for the conditional mean outcome function using an estimator of the propensity score, the conditional probability of treatment assignment (Rosenbaum and Rubin 1983), and then re-center the posterior distribution of the treatment effect using the propensity score estimator together with a pilot estimator of the outcome regression. The authors show that under unconfoundedness and overlap, the corrected posterior converges in distribution to a normal law centered at a semiparametrically efficient estimator, and that the resulting credible intervals achieve asymptotically exact Frequentist coverage.

Let \(Y_i\in\{0,1\}\) be the outcome and \(D_i\in\{0,1\}\) the treatment,⁸⁰ thus

\[ Y_i = D_i\,Y_i(1) + (1-D_i)\,Y_i(0), \]

where \(Y_i(1)\) and \(Y_i(0)\) are the potential outcomes.

Let \(\mathbf{X}_i\) denote pre-treatment covariates with distribution \(F_0\) and density \(f_0\), and define the propensity score

\[ \pi_0(\mathbf{x}) := P(D_i=1\mid \mathbf{X}_i=\mathbf{x}) \]

and the conditional mean outcome (success probability)

\[ m_0(d,\mathbf{x}) := P(Y_i=1\mid D_i=d,\mathbf{X}_i=\mathbf{x}). \]

For an i.i.d. sample \(W_i=(Y_i,D_i,\mathbf{X}_i^{\top})^{\top}\), the joint density can be written as

\[ p_{f,\pi,m}(y,d,\mathbf{x}) = f(\mathbf{x})\,[\pi(\mathbf{x})]^d\,[1-\pi(\mathbf{x})]^{1-d}\,[m(d,\mathbf{x})]^y\,[1-m(d,\mathbf{x})]^{1-y}. \]

The parameter of interest is the average treatment effect (ATE),

\[ \tau := \mathbb{E}_0\!\left[\,Y_i(1)-Y_i(0)\,\right], \]

where \(\mathbb{E}_0\) is the expectation with respect to the population distribution.

Breunig, Liu, and Yu (2025) impose support restrictions via link functions. For the outcome regression and propensity score use logits:

\[ m_\eta(d,\mathbf{x})=\frac{1}{1+\exp\{-\eta^m(d,\mathbf{x})\}}, \qquad \pi_\eta(\mathbf{x})=\frac{1}{1+\exp\{-\eta^\pi(\mathbf{x})\}}, \]

so that

\[ \eta^m(d,\mathbf{x})=\log\!\left(\frac{m_\eta(d,\mathbf{x})}{1-m_\eta(d,\mathbf{x})}\right), \qquad \eta^\pi(\mathbf{x})=\log\!\left(\frac{\pi_\eta(\mathbf{x})}{1-\pi_\eta(\mathbf{x})}\right). \]

For the covariate density, they use a log-density parametrization

\[ f_\eta(\mathbf{x})=\exp\{\eta^f(\mathbf{x})\}, \qquad \eta^f(\mathbf{x})=\log f_\eta(\mathbf{x}). \]

Given a prior on \((\eta^m,\eta^\pi,\eta^f)\), the induced ATE is

\[ \tau_\eta := \mathbb{E}_\eta\!\left[m_\eta(1,\mathbf{X})-m_\eta(0,\mathbf{X})\right]. \]

The basis for inference on \(\tau_\eta\) is the efficient influence function (EIF):

\[ \tilde{\tau}_\eta(W) = \big\{m_\eta(1,\mathbf{X})-m_\eta(0,\mathbf{X})\big\} + \gamma_\eta(D,\mathbf{X})\,\big\{Y-m_\eta(D,\mathbf{X})\big\} - \tau_\eta, \]

where the Riesz representer for the ATE functional is

\[ \gamma_\eta(D,\mathbf{X}) = \frac{D}{\pi_\eta(\mathbf{X})} - \frac{1-D}{1-\pi_\eta(\mathbf{X})}. \]

An influence function measures the first-order effect of an infinitesimal contamination of the underlying distribution on the estimand. The efficient influence function attains the semiparametric efficiency bound (its variance is the minimal asymptotic variance in the model). At the truth \(\eta_0\), the EIF is mean-zero: \(\mathbb{E}_0[\tilde{\tau}_{\eta_0}(W)]=0\). Moreover, it is doubly robust:

\[ \mathbb{E}_0\!\big[\tilde{\tau}_\eta(W)\big]=0 \quad\text{if either } m_\eta \text{ or } \pi_\eta \text{ is correctly specified.} \]

To conduct Bayesian inference, Breunig, Liu, and Yu (2025) place independent priors on the outcome regression and the covariate distribution (equivalently, on \(\eta^m\) and \(\eta^f\)). For the covariates they use a Dirichlet Process (DP) prior on \(F\) (hence on \(f\); see Chapter 11), which, under standard conditions, induces posterior draws that approximate the Bayesian bootstrap for sample supported functionals (Lo 1987) (see Chapter 6).

For the outcome regression they place a Gaussian Process (GP) prior on \(\eta^m\) (see Chapter 12), with a correction term that incorporates a preliminary estimator of the Riesz representer. The propensity score is estimated in a separate first step and is used in the correction/recentering; thus, a prior for \(\eta^\pi\) is not required for their procedure.

The following algorithm outlines the doubly robust Bayesian procedure of Breunig, Liu, and Yu (2025). The pilot estimate of the Riesz representer, \(\hat\gamma_\eta\), is obtained by plug–in, using a propensity score \(\hat\pi(\mathbf{x})\) estimated via logistic Lasso.

The pilot estimate of the outcome regression, \(\hat m_\eta(d,\mathbf{x})=\Pr(Y=1\mid D=d,\mathbf{X}=\mathbf{x})\), is taken as the posterior mean from a GP classification model fit with a Laplace approximation.

Concretely, place a GP prior on the latent logit \(\eta^m(d,\mathbf{x})\) with a squared exponential kernel, and map to success probabilities via the logistic link \(m(d,\mathbf{x})=\operatorname{logit}^{-1}\{\eta^m(d,\mathbf{x})\}\). Because the Bernoulli likelihood makes the required integrals over \(\eta^m\) analytically intractable, the posterior is approximated using Laplace’s method (Expectation Propagation is a common alternative). See Sections 3.3–3.5 of Rasmussen and Williams (2006) for details.

The adjusted GP prior specifies the latent logit as

\[ \eta^m(d,\mathbf{x}_i) \;=\; W^m(d,\mathbf{x}_i)\;+\;\lambda\,\widehat{\gamma}_\eta(d,\mathbf{x}_i), \qquad W^m \sim \mathrm{GP}\!\big(0, K\big),\ \ \lambda \sim N(0,\sigma_n^2), \]

where \(K\) is the base kernel. Following Breunig, Liu, and Yu (2025), take

\[ \sigma_n \;=\; \frac{\log n}{\sqrt{n}\,\Gamma_n}, \qquad \Gamma_n \;:=\; \frac{1}{n}\sum_{i=1}^n \big|\widehat{\gamma}_\eta(d_i,\mathbf{x}_i)\big|, \] where \(n\) is the effective sample size (after trimming).

Since \(W^m\) and \(\lambda\) are Gaussian, this is equivalent to a GP with an augmented kernel,

\[ \eta^m \sim \mathrm{GP}\!\big(0,\, K_c\big), \quad K_c\big((d,\mathbf{x}),(d',\mathbf{x}')\big) = K\big((d,\mathbf{x}),(d',\mathbf{x}')\big) + \sigma_n^2\,\widehat{\gamma}_\eta(d,\mathbf{x})\, \widehat{\gamma}_\eta(d',\mathbf{x}'), \]

which, for the vector of latent logits evaluated at the training inputs, corresponds to the covariance matrix \(\mathbf{K} + \sigma_n^2\,\widehat{\boldsymbol{\gamma}}\, \widehat{\boldsymbol{\gamma}}^{\top}\).

Algorithm: Double Robust Bayesian (DR–Bayes)

Estimate the nuisance components:
- Obtain initial estimates of the Riesz representer \(\widehat{\gamma}_{\eta}\)
  using a logistic Lasso.
- Trim the propensity score using either:
  - a standard trimming rule (e.g., keep units with scores in [0.1, 0.9]), or
  - the optimal Crump et al. (2009) trimming rule.
- Estimate the Riesz representer \(\widehat{\gamma}_{\eta}(d_i, x_i)\)
  using the trimmed sample.
Compute the prior correction term:

Let: \[ \Gamma_n = \frac{1}{n}\sum_{i=1}^{n} \left|\,\widehat{\gamma}_{\eta}(d_i, x_i)\,\right|, \qquad \sigma_n = \frac{\log n}{\sqrt{n}\,\Gamma_n}, \] where \(n\) is the effective sample size after trimming.

Compute the prior adjustment: \[ \sigma_n \,\widehat{\gamma}_{\eta}. \]
Define the GP prior and unadjusted mean model:
- Compute the unadjusted posterior mean \[ \widehat{m}(d,x) = \frac{1}{1 + \exp\{-\widehat{\eta}^{m}(d,x)\}}, \] where \(\widehat{\eta}^{m}(d,x)\) is the GP posterior mean.
- Set the adjusted Gaussian Process prior: \[ m_{\eta}^{c}(d,x) = \frac{1}{1 + \exp\{-\eta^{m}(d,x)\}}, \qquad \eta^{m} \sim GP(0, K_{c}), \] with covariance kernel \[ K_{c}\big((d,x),(d',x')\big) = K\big((d,x),(d',x')\big) + \sigma_{n}^{2}\, \widehat{\gamma}_{\eta}(d,x)\, \widehat{\gamma}_{\eta}(d',x'). \]
Posterior sampling:
For \(s = 1,\dots,S\):

(a) Draw the adjusted GP posterior:
\[ m_{\eta}^{c,(s)}(d,x) \quad \text{from the adjusted GP prior}. \]

(b) Bayesian bootstrap weights:
Draw \[ M_{n,i}^{(s)} = \frac{e_{i}^{(s)}}{\sum_{j=1}^{n} e_{j}^{(s)}}, \qquad e_{i}^{(s)} \sim \mathrm{Exp}(1), \] for \(i = 1,\dots,n\).

(c) Compute the DR-corrected posterior ATE:
- Uncorrected ATE draw: \[ \tau_{\eta}^{(s)} = \sum_{i=1}^{n} M_{n,i}^{(s)} \left[ m_{\eta}^{c,(s)}(1, x_i) \;-\; m_{\eta}^{c,(s)}(0, x_i) \right]. \]
- Bias-correction term: \[ \widehat{b}_{\eta}^{(s)} = \frac{1}{n} \sum_{i=1}^{n} \tau\!\left[\, m_{\eta}^{(s)}(w_i) - \widehat{m}(w_i) \,\right], \] where \[ \tau[m] = m(1,x) - m(0,x) + \widehat{\gamma}_{\eta}(d,x)\, \left(y - m(d,x)\right). \]
- Corrected ATE draw: \[ \widetilde{\tau}_{\eta}^{(s)} = \tau_{\eta}^{(s)} - \widehat{b}_{\eta}^{(s)}. \]
End for.

This algorithm produces posterior draws of the ATE.
The \((1-\alpha)\%\) credible interval is given by the empirical quantiles at levels \(\alpha/2\) and \(1-\alpha/2\).

It is worth noting that Breunig, Liu, and Yu (2025) do not perform sample splitting, whereas Victor Chernozhukov et al. (2018) recommend it in the context of Double/Debiased Machine Learning for treatment and structural parameters. While sample splitting (or cross-fitting) helps reduce overfitting, its role in Victor Chernozhukov et al. (2018) is more fundamental: it ensures that the nuisance functions are estimated on data independent of the observations used for inference, which is critical for their theoretical guarantees of \(\sqrt{N}\)-consistency and valid asymptotic inference. Nevertheless, Breunig, Liu, and Yu (2025) report that they did not find evidence of overfitting in their setting and verified that their results are robust to the use of sample splitting.
Finally, observe that the doubly robust approach does not propagate the uncertainty due to the estimation of the propensity score, which is treated as a plug-in. This omission is justified because the EIF is Neyman orthogonal: estimation error in the propensity score enters only at second order and is therefore asymptotically negligible for uncertainty quantification.

Example: Doubly robust Bayesian inference

Let us simulate the process
\[ \eta^{\pi}_i = -0.3 + 0.8X_{i1} - 0.5X_{i2} + 0.7X_{i3}, \quad \eta^{m}_i = -0.2 + 0.6D_i + 0.5X_{i1} - 0.4X_{i2} + 0.3X_{i3}, \] where \(X_{1}\sim N(0,1)\), \(X_{2}\sim \mathrm{Bernoulli}(0.5)\), and \(X_{3}\sim U(-1,1)\), and the sample size is 2,000.

The population ATE from this simulation is approximately \(0.138\). The posterior mean and 95% credible interval of the ATE using the DRB are \(0.155\) and \((0.107,\,0.206)\), respectively.

The following code shows how to implement the DRB inferential framework in this setting.

######### Doubly Robust Bayesian: Simulation #########
rm(list = ls()); set.seed(123); library(glmnet)

## Loaded glmnet 4.1-8

## simulate covariates
n  <- 2000
X1 <- rnorm(n)                 # continuous
X2 <- rbinom(n, 1, 0.5)        # binary
X3 <- runif(n, -1, 1)          # continuous
Z <- cbind(X1, X2, X3)
# True propensity and outcome
logit <- function(t) 1/(1+exp(-t))
true_pi  <- function(X){ logit(-0.3 + 0.8*X[,1] -0.5*X[,2] + 0.7*X[,3]) }
true_eta <- function(d,X){ -0.2 + 0.6*d + 0.5*X[,1] -0.4*X[,2] + 0.3*X[,3] }
true_p   <- function(d,X){ logit(true_eta(d,X)) }
ATE  <- mean(true_p(1,Z) - true_p(0,Z)) # Population ATE
# Data
pi <- true_pi(Z); D  <- rbinom(n,1,pi)
P  <- true_p(D,Z); Y  <- rbinom(n,1,P)
dat <- data.frame(Y, D, X1, X2 = factor(X2), X3)
######### Doubly Robust Bayesian: Implementation #########
p <- dim(Z)[2]; covars <- c("X1","X2","X3")
Z.df <- as.data.frame(Z) # covariates as a dataframe
# Propensity score model: lasso
cvfit <- cv.glmnet(Z, D, family = "binomial", alpha=1, nfolds = 10) # lasso,min
ps.coef <- coef(cvfit, s = "lambda.min")[0:p+1]
ps_hat <- predict(cvfit, newx = Z, s = "lambda.min",type = "response")
ps_hat <- pmin(pmax(ps_hat, 1e-6), 1 - 1e-6)
# Riesz representer
riesz_fun <- function(d, ps) d/ps - (1 - d)/(1 - ps)
# GP model
make_gp <- function(X_train, y_bin, use_correction = FALSE, gamma_scaled = NULL,
approx_method = gplite::approx_laplace(),
init_lscale = 0.3) {
    stopifnot(length(y_bin) == nrow(X_train))
    # Initializing the covariance function
    cf_se <- gplite::cf_sexp(vars = colnames(X_train), lscale = init_lscale, magn = 1, normalize = TRUE)
    cfs <- list(cf_se)
    if (use_correction) {
        stopifnot(!is.null(gamma_scaled))
        # Add linear kernel on the last column (gamma_scaled)
        cf_lin <- gplite::cf_lin(vars = "gamma", magn = 1, normalize = FALSE)
        cfs <- list(cf_se, cf_lin)
    }
    # Initializes a GP model
    gp <- gplite::gp_init(cfs = cfs, lik = gplite::lik_bernoulli(), approx = approx_method)
    # Optimizing hyperparameters
    gp <- gplite::gp_optim(gp, x = X_train, y = y_bin, maxiter = 1000, restarts = 3, tol = 1e-05)
    return(gp)
}
posterior_draws_prob <- function(gp, X_test, ndraws = 5000) {
    out <- gplite::gp_draw(gp, xnew = X_test, draws = ndraws, transform = TRUE, target = FALSE, jitter = 1e-6)
    return(t(out))
}
# Main function
run_trim <- function(t_trim = 0.10, N_post = 5000, h_r = 1/2, cor_size = 1) {
    # Trim by estimated PS
    keep <- which(ps_hat >= t_trim & ps_hat <= (1 - t_trim))
    n_eff <- length(keep)
    if (n_eff < 50) stop("Too few observations after trimming.")
    Yk  <- Y[keep]
    Dk  <- D[keep]
    Zk  <- Z[keep, , drop = FALSE]
    psk <- ps_hat[keep]
    # Riesz representer and sigma_n choice
    gamma_k <- riesz_fun(Dk, psk)
    sigma_n <- cor_size * (log(n_eff) / ( (n_eff)^(h_r) * mean(abs(gamma_k)) ))
    gamma_scaled <- sigma_n * gamma_k
    # Training inputs for GP: [Z, D] and, if corrected, an extra column 'gamma'
    X_train_nc <- data.frame(Zk)
    X_train_nc$D <- Dk
    colnames(X_train_nc) <- c(covars, "D")
    X_train_c <- X_train_nc
    X_train_c$gamma <- gamma_scaled
    # Test inputs: for each i, (Z_i, d=0) and (Z_i, d=1)
    X_t0 <- X_train_nc; X_t0$D <- 0
    X_t1 <- X_train_nc; X_t1$D <- 1
    X_t0c <- X_t0; X_t1c <- X_t1
    X_t0c$gamma <- sigma_n * riesz_fun(0, psk)   
    X_t1c$gamma <- sigma_n * riesz_fun(1, psk) 
    # GP WITHOUT prior correction #
    gp_nc <- make_gp(X_train = X_train_nc, y_bin = Yk, use_correction = FALSE)
    # joint draws for [m(Z,0); m(Z,1)]
    M_nc_0 <- posterior_draws_prob(gp_nc, X_t0, ndraws = N_post)  # N_post x n_eff
    M_nc_1 <- posterior_draws_prob(gp_nc, X_t1, ndraws = N_post)
    # observed arm probabilities
    M_nc_obs <- ifelse(matrix(Dk, nrow = N_post, ncol = n_eff, byrow = TRUE) == 1, M_nc_1, M_nc_0)
    # GP WITH prior correction #
    gp_c <- make_gp(X_train = X_train_c, y_bin = Yk, use_correction = TRUE, gamma_scaled = gamma_scaled)
    M_c_0 <- posterior_draws_prob(gp_c, X_t0c, ndraws = N_post)
    M_c_1 <- posterior_draws_prob(gp_c, X_t1c, ndraws = N_post)
    M_c_obs <- ifelse(matrix(Dk, nrow = N_post, ncol = n_eff, byrow = TRUE) == 1, M_c_1, M_c_0)
    # Bayesian bootstrap weights #
    W <- matrix(rexp(N_post * n_eff, rate = 1), nrow = N_post)
    W <- W / rowSums(W)
    # ATEs
    # (1) Unadjusted Bayes
    Ate_nc_draws <- rowSums((M_nc_1 - M_nc_0) * W)
    # (2) Prior-adjusted Bayes
    Ate_c_draws  <- rowSums((M_c_1 - M_c_0) * W)
    # (3) DR Bayes (posterior recentering)
    # Pre-centering using UNcorrected GP posterior means
    mu_nc_diff <- colMeans(M_nc_1 - M_nc_0)                
    mu_nc_obs  <- colMeans(M_nc_obs)                       
    ATE_dr_pre <- mean(mu_nc_diff + gamma_k * (Yk - mu_nc_obs))
    # Recenter draws using corrected GP draws
    DR_rec_1 <- (matrix(gamma_k, nrow = N_post, ncol = n_eff, byrow = TRUE)) * (matrix(Yk, nrow = N_post, ncol = n_eff, byrow = TRUE) - M_c_obs)
    # The first component is \tau_{\eta}^s in Algorithm 1
    # The second component is \hat{m} a scalar. This has positive sign because
    # minus x minus, the bias is with minus, and then, this component enters with minus in the bias term
    # The third component is \gamma_k x (y-m(d,x)), m(d,x) is with prior correction
    # The fourth component is m(1,x)-m(0,x) also with correction in the prior
    Ate_drb_draws <- rowSums((M_c_1 - M_c_0) * W) + ATE_dr_pre - rowSums(DR_rec_1) / n_eff - rowSums(M_c_1 - M_c_0) / n_eff
    
    qfun <- function(x) {
        qs <- quantile(x, c(0.025, 0.975))
        c(mean   = mean(x),     median = median(x), lo = qs[1], hi = qs[2], len = diff(qs)
        )
    }   
    vals <- list(
    Bayes      = qfun(Ate_nc_draws),
    `PA Bayes` = qfun(Ate_c_draws),
    `DR Bayes` = qfun(Ate_drb_draws)
    )
    out_df <- as.data.frame(do.call(rbind, vals), check.names = FALSE)
    list(t_trim = t_trim, results = out_df)
}
N_post <- 5000
h_r    <- 1/2
cor_size <- 1
trim <- 0.05
res_list <- run_trim(t_trim = trim, N_post = N_post, h_r = h_r, cor_size = cor_size)

## Optimizing parameters
## p1: log cf_sexp.lscale
## p2: log cf_sexp.magn
## 
##       p1       p2     Energy Iteration
##    -1.20     0.00    1331.25         0 
##    -1.08     0.00    1326.34         1 
##    -1.20     0.12    1336.52         2 
##    -1.08    -0.12    1323.03         3 
##    -1.02    -0.24    1319.06         4 
##    -0.90    -0.24    1314.93         5 
##    -0.75    -0.36    1310.14         6 
##    -0.69    -0.60    1310.74         7 
##    -0.79    -0.45    1311.64         8 
##    -0.42    -0.72    1305.59         9 
##    -0.12    -0.96    1302.47        10 
##    -0.18    -0.72    1299.32        11 
##     0.08    -0.78    1293.88        12 
##     0.71    -1.38    1296.69        13 
##     0.34    -1.13    1294.48        14 
##     0.54    -0.95    1287.71        15 
##     0.87    -0.94    1285.00        16 
##     0.60    -0.59    1283.35        17 
##     0.73    -0.33    1281.34        18 
##     1.52    -0.49    1281.06        19 
##     2.24    -0.34    1293.78        20 
##     1.38     0.13    1276.72        21 
##     1.64     0.66    1276.74        22 
##     2.17    -0.03    1283.12        23 
##     1.09    -0.25    1278.29        24 
##     0.95     0.36    1282.17        25 
##     1.38    -0.27    1277.73        26 
##     1.67     0.11    1276.27        27 
##     1.96     0.29    1276.46        28 
##     1.67     0.51    1276.00        29 
##     1.82     0.90    1276.56        30 
##     1.96     0.49    1275.62        31 
##     2.25     0.67    1275.96        32 
##     1.97     0.89    1275.75        33 
##     1.89     0.69    1275.57        34 
##     2.18     0.67    1275.67        35 
##     2.05     0.63    1275.48        36 
##     1.98     0.84    1275.57        37 
##     1.98     0.75    1275.45        38 
##     2.14     0.69    1275.52        39 
##     2.08     0.69    1275.42        40 
##     2.00     0.81    1275.47        41 
##     2.02     0.76    1275.41        42 
##     2.12     0.70    1275.45        43 
##     2.08     0.72    1275.40        44 
##     2.02     0.79    1275.42        45 
##     2.06     0.72    1275.39        46 
##     2.13     0.67    1275.55        47 
##     2.04     0.74    1275.38        48 
##     2.03     0.74    1275.39        49 
##     2.04     0.73    1275.38        50 
## Assessing convergence...
##     2.14     0.74    1275.44        51 
##     1.94     0.74    1275.51        52 
##     2.04     0.84    1275.44        53 
##     2.04     0.64    1275.45        54 
## Optimization converged.Optimizing parameters
## p1: log cf_sexp.lscale
## p2: log cf_sexp.magn
## p3: log cf_lin.magn
## 
##       p1       p2       p3     Energy Iteration
##    -1.20     0.00     0.00    1333.25         0 
##    -1.08     0.00     0.00    1328.52         1 
##    -1.20     0.12     0.00    1338.75         2 
##    -1.20     0.00     0.12    1333.35         3 
##    -1.12    -0.12     0.08    1326.56         4 
##    -1.08    -0.24     0.12    1323.08         5 
##    -1.04    -0.16    -0.04    1322.64         6 
##    -0.96    -0.24    -0.12    1318.61         7 
##    -0.88    -0.32     0.00    1315.57         8 
##    -0.72    -0.48     0.00    1311.35         9 
##    -0.76    -0.64     0.00    1314.65        10 
##    -0.84    -0.48     0.00    1314.87        11 
##    -0.55    -0.67    -0.20    1309.06        12 
##    -0.28    -0.88    -0.36    1306.41        13 
##    -0.21    -1.10    -0.12    1310.45        14 
##    -0.40    -0.88    -0.12    1309.81        15 
##    -0.17    -0.86    -0.32    1303.07        16 
##     0.12    -0.96    -0.48    1297.97        17 
##     0.35    -1.34    -0.64    1302.21        18 
##     0.08    -1.12    -0.48    1302.74        19 
##     0.53    -1.24    -0.87    1296.10        20 
##     0.99    -1.42    -1.24    1300.38        21 
##     0.94    -1.48    -0.97    1302.32        22 
##     0.64    -1.33    -0.82    1297.36        23 
##     0.51    -1.02    -0.80    1291.41        24 
##     0.59    -0.86    -0.88    1287.85        25 
##     1.05    -1.32    -1.23    1297.45        26 
##     0.82    -1.23    -1.04    1293.51        27 
##     0.65    -0.89    -1.05    1287.54        28 
##     0.66    -0.67    -1.16    1285.20        29 
##     0.85    -0.60    -1.19    1282.91        30 
##     1.01    -0.28    -1.35    1280.68        31 
##     0.69     0.03    -1.22    1285.91        32 
##     0.72    -0.29    -1.18    1283.72        33 
##     1.00     0.04    -1.58    1281.63        34 
##     0.90    -0.19    -1.40    1281.87        35 
##     1.16     0.32    -1.57    1281.51        36 
##     1.03     0.07    -1.47    1281.41        37 
##     1.31     0.17    -1.76    1279.11        38 
##     1.61     0.40    -2.05    1277.81        39 
##     1.43     0.10    -1.67    1278.15        40 
##     1.32     0.08    -1.65    1278.74        41 
##     1.66     0.08    -1.91    1277.76        42 
##     1.98     0.08    -2.12    1279.34        43 
##     2.13     0.66    -2.40    1276.77        44 
##     2.69     1.12    -2.92    1276.98        45 
##     2.16     0.66    -2.56    1276.80        46 
##     1.98     0.52    -2.34    1276.88        47 
##     2.36     0.53    -2.53    1278.44        48 
##     1.79     0.43    -2.17    1277.11        49 
##     2.39     1.09    -2.85    1276.64        50 
##     2.76     1.60    -3.32    1276.91        51 
##     2.66     1.17    -3.04    1276.79        52 
##     2.44     0.99    -2.82    1276.68        53 
##     2.48     1.17    -2.81    1276.63        54 
##     2.64     1.42    -2.93    1276.76        55 
##     2.75     1.51    -3.25    1276.77        56 
##     2.28     0.87    -2.61    1276.65        57 
##     2.33     1.10    -2.69    1276.74        58 
##     2.41     1.02    -2.79    1276.63        59 
##     2.58     1.31    -3.02    1276.69        60 
##     2.36     0.98    -2.71    1276.62        61 
##     2.44     1.02    -2.69    1276.65        62 
##     2.40     1.07    -2.81    1276.62        63 
## Assessing convergence...
##     2.46     0.98    -2.71    1276.71        64 
##     2.26     0.98    -2.71    1276.69        65 
##     2.36     1.08    -2.71    1276.67        66 
##     2.36     0.88    -2.71    1276.71        67 
##     2.36     0.98    -2.61    1276.62        68 
##     2.36     0.98    -2.81    1276.62        69 
## Optimization converged.

print(res_list$results)

##               mean    median   lo.2.5%  hi.97.5%  len.97.5%
## Bayes    0.1549282 0.1546898 0.1088353 0.2004221 0.09158676
## PA Bayes 0.1558401 0.1557979 0.1083011 0.2045678 0.09626670
## DR Bayes 0.1559572 0.1558254 0.1075024 0.2058501 0.09834763

References

Breunig, Christoph, Ruixuan Liu, and Zhengfei Yu. 2025. “Double Robust Bayesian Inference on Average Treatment Effects.” Econometrica 93 (2): 539–68. https://doi.org/10.3982/ECTA21442.

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal 21: C1–68. https://doi.org/10.1111/ectj.12097.

Lo, Albert Y. 1987. “A Large Sample Study of the Bayesian Bootstrap.” The Annals of Statistics 15 (1): 360–75. https://doi.org/10.1214/aos/1176350271.

Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press.

Rosenbaum, Paul R., and Donald B. Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (1): 41–55. https://doi.org/10.1093/biomet/70.1.41.

Breunig, Liu, and Yu (2025) extend their framework to continuous, counting and multinomial outcomes.↩︎