## 6.21 Log-binomial regression to estimate a risk ratio or prevalence ratio

Logistic regression is a special case of a family of models know as **generalized linear models**. Each member of this family has an assumed distribution for the outcome and a **link function** that connects the mean outcome to a linear combination of predictors \(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_K X_K\) (the **linear predictor**). In logistic regression, the outcome is assumed to have a binomial distribution and the link function is the logit function \(\ln(p/(1-p))\). Linear regression is also a special case, with a normal distribution and an identity link function (the mean is assumed to be equal to the linear predictor).

Another special case of a generalized linear model is the **log-binomial regression** model which, like logistic regression, assumes a binomial distribution for a binary outcome but, unlike logistic regression, uses a log link function as shown in Equation (6.2).

\[\begin{equation} \ln{p} = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_K X_K \tag{6.2} \end{equation}\]

With logistic regression, the left-hand side is the log of the odds, whereas in log-binomial regression it is the log of the probability. Exponentiating a regression coefficient in logistic regression results in an odds ratio. Similarly, exponentiating a regression coefficient in log-binomial regression results in a risk ratio (RR) or prevalence ratio (PR). The model described by Equation (6.2) can be used to estimate a RR from incidence data or a PR from prevalence data. Thus, for a predictor \(X_k\), the RR or PR is \(e^{\beta_k}\).

A disadvantage of log-binomial regression is that the left-hand side \((\ln{p})\) is constrained to be positive while the right-hand side can be anything from \(-\infty\) to \(\infty\). This leads to convergence issues at times (Williamson, Eliasziw, and Fick 2013). One method for fitting a log-binomial model is to use `glm()`

with `family = binomial(link="log")`

. Alternatively, use the `logbin()`

function in the `logbin`

package (Donoghoe and Marschner 2018) which may converge even in cases where `glm()`

fails.

**Example 6.2 (continued):** Logistic regression estimated an OR comparing lifetime marijuana use between males and females of 1.44. Use log-binomial regression to compute the corresponding prevalence ratio.

```
library(logbin)
fit.ex6.2.logbin <- logbin(mj_lifetime ~ demog_sex,
data = nsduh,
method = "em")
# Summary of model
round(summary(fit.ex6.2.logbin)$coef, 4)
```

```
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7629 0.0463 -16.479 0.0000
## demog_sexMale 0.1794 0.0620 2.894 0.0038
```

```
# PR, and 95% CI for PR
PR.CI <- cbind("PR" = exp(coef(fit.ex6.2.logbin)),
exp(confint(fit.ex6.2.logbin)))[-1,]
round(PR.CI, 3)
```

```
## PR 2.5 % 97.5 %
## 1.197 1.060 1.351
```

Although not needed for this example, if the predictor were categorical with more than two levels then you can obtain a Type III multiple df test as usual.

**Conclusion**: Males are 1.20 times as likely to have ever used marijuana than females (PR = 1.20; 95% CI = 1.06, 1.35; p = .004).

In the interpretation, we used the phrase “times as likely” rather than “times the odds” because log-binomial regression models the log of the probability, not the log-odds. We could also say that the prevalence of marijuana use is 20% greater among males. If this were incidence data, we could say that males have 20% greater risk. To compute an adjusted RR or PR, simply add the confounding variables to the model formula.

**NOTES:**

- If you use
`predict()`

or`gmodels::estimable()`

to estimate a probability from a log-binomial model, use`exp()`

rather than`ilogit()`

when transforming the prediction to the probability scale. `logbin()`

does not allow interaction terms using the`:`

notation. If`glm()`

with`family(link = "log")`

converges, then that is the simplest way to include an interaction since it does allow the`:`

notation. To include an interaction with`logbin`

you must create variables corresponding to the interaction terms outside the model and then include those variables in the model (see Section 9.6.4.2 for an example, from a different context, of how to do this).

### References

*Journal of Statistical Software*86.9: 1–22. https://doi.org/10.18637/jss.v086.i09.

*Emerging Themes in Epidemiology*10 (1): 14. https://doi.org/10.1186/1742-7622-10-14.