7.5 Negative Binomial Regression

When modeling count data, Poisson regression assumes that the mean and variance are equal:

\[ \text{Var}(Y_i) = E(Y_i) = \mu_i \]

However, in many real-world datasets, the variance exceeds the mean: a phenomenon known as overdispersion. When overdispersion is present, the Poisson model underestimates the variance, leading to:

  • Inflated test statistics (small p-values).

  • Overconfident predictions.

  • Poor model fit.

7.5.1 Negative Binomial Distribution

To address overdispersion, Negative Binomial regression introduces an extra dispersion parameter \(\theta\) to allow variance to be greater than the mean: \[ \text{Var}(Y_i) = \mu_i + \theta \mu_i^2 \] where:

  • \(\mu_i = \exp(\mathbf{x_i' \theta})\) is the expected count.

  • \(\theta\) is the dispersion parameter.

  • When \(\theta \to 0\), the NB model reduces to the Poisson model.

Alternatively, some people might define (e.g., the MASS package) the variance as:

\[ Var(Y_i) = \mu_i + \frac{\mu^2}{\theta} \]

where as \(\theta \to \infty\), the NB model reduces to the Poisson model.

Thus, Negative Binomial regression is a generalization of Poisson regression that accounts for overdispersion.

7.5.2 Application: Negative Binomial Regression

We apply Negative Binomial regression to the bioChemists dataset to model the number of research articles (Num_Article) as a function of several predictors.

7.5.2.1 Fitting the Negative Binomial Model

# Load necessary package
library(MASS)

# Fit Negative Binomial model
NegBinom_Mod <- MASS::glm.nb(Num_Article ~ ., data = bioChemists)

# Model summary
summary(NegBinom_Mod)
#> 
#> Call:
#> MASS::glm.nb(formula = Num_Article ~ ., data = bioChemists, init.theta = 2.264387695, 
#>     link = log)
#> 
#> Coefficients:
#>                  Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)      0.256144   0.137348   1.865 0.062191 .  
#> SexWomen        -0.216418   0.072636  -2.979 0.002887 ** 
#> MarriedMarried   0.150489   0.082097   1.833 0.066791 .  
#> Num_Kid5        -0.176415   0.052813  -3.340 0.000837 ***
#> PhD_Quality      0.015271   0.035873   0.426 0.670326    
#> Num_MentArticle  0.029082   0.003214   9.048  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for Negative Binomial(2.2644) family taken to be 1)
#> 
#>     Null deviance: 1109.0  on 914  degrees of freedom
#> Residual deviance: 1004.3  on 909  degrees of freedom
#> AIC: 3135.9
#> 
#> Number of Fisher Scoring iterations: 1
#> 
#> 
#>               Theta:  2.264 
#>           Std. Err.:  0.271 
#> 
#>  2 x log-likelihood:  -3121.917

Interpretation:

  • The coefficients are on the log scale.

  • The dispersion parameter \(\theta\) (also called size parameter in some contexts) is estimated as 2.264 with a standard error of 0.271. Check Over-Dispersion for more detail.

  • Since \(\theta\) is significantly different from 1, this confirms overdispersion, validating the choice of the Negative Binomial model over Poisson regression.

# Extract dispersion parameter estimate
NegBinom_Mod$theta
#> [1] 2.264388
  • A large \(\theta\) suggests that overdispersion is present, while a small \(\theta\) (close to 0) would indicate the Poisson model is reasonable.

7.5.2.2 Simulation of Overdispersion and the Role of Theta

We create two synthetic datasets, one Poisson with no extra dispersion, one Negative Binomial with clear overdispersion. We fit glm with Poisson, and MASS::glm.nb. We compare estimated theta to ground truth and interpret results.

Parameterization used by MASS: NB2 with mean \(\mu\) and variance \(Var(Y) = \mu^2 + \frac{\mu^2}{\theta}\). Larger \(\theta\) implies variance closer to Poisson. As \(\theta \to \infty\), the Negative Binomial approaches the Poisson.

set.seed(20251027)
suppressPackageStartupMessages({
  library(MASS)     # glm.nb
  library(knitr)    # kable for clean tables
})

We use the same covariates and coefficients for both datasets. Dataset A is generated from a Poisson, dataset B from an NB2 with true theta = 2.

n  <- 5000
x1 <- rnorm(n, 0, 1)
x2 <- rbinom(n, 1, 0.4)
x3 <- runif(n, -1, 1)

b0 <- 0.2; b1 <- 0.5; b2 <- -0.7; b3 <- 0.9
eta <- b0 + b1 * x1 + b2 * x2 + b3 * x3
mu  <- exp(eta)

# A: Poisson truth, no extra dispersion
y_pois <- rpois(n, lambda = mu)
dat_pois <- data.frame(y = y_pois, x1, x2, x3)

# B: NB truth with strong overdispersion
theta_true <- 2.0
y_nb <- rnbinom(n, size = theta_true, mu = mu)
dat_nb <- data.frame(y = y_nb, x1, x2, x3)

For each dataset we fit:

  1. Poisson GLM, glm(..., family = poisson)
  2. Negative Binomial GLM with unknown theta, glm.nb(...)

We then extract key diagnostics.

# Dataset A, Poisson truth
fitA_pois <- glm(y ~ x1 + x2 + x3, family = poisson(link = "log"), data = dat_pois)
fitA_nb   <- glm.nb(y ~ x1 + x2 + x3, data = dat_pois)

# Dataset B, NB truth
fitB_pois <- glm(y ~ x1 + x2 + x3, family = poisson(link = "log"), data = dat_nb)
fitB_nb   <- glm.nb(y ~ x1 + x2 + x3, data = dat_nb)

Dispersion at the outcome level

Variance to mean ratio near 1 suggests Poisson, ratios much greater than 1 suggest overdispersion.

vmr_A <- var(dat_pois$y) / mean(dat_pois$y)
vmr_B <- var(dat_nb$y)   / mean(dat_nb$y)

disp_tab <- data.frame(
  dataset = c("Poisson truth", "NB truth"),
  n = c(n, n),
  mean_y = c(mean(dat_pois$y), mean(dat_nb$y)),
  var_y  = c(var(dat_pois$y),  var(dat_nb$y)),
  var_to_mean_ratio = c(vmr_A, vmr_B)
)

kable(disp_tab, digits = 3)
dataset n mean_y var_y var_to_mean_ratio
Poisson truth 5000 1.231 2.426 1.970
NB truth 5000 1.259 3.960 3.146

Interpretation: In the Poisson dataset the variance to mean ratio should be close to 1. In the NB dataset the ratio should be clearly larger than 1, which signals overdispersion that Poisson cannot capture.

For each dataset we report:

  • theta estimate from glm.nb with its standard error and a Wald 95 percent interval,
  • log likelihood and AIC for Poisson and NB,
  • a likelihood ratio test comparing Poisson to NB, 1 degree of freedom, since NB adds one parameter, theta.
z <- qnorm(0.975)

theta_A  <- unname(fitA_nb$theta)
se_A     <- unname(fitA_nb$SE.theta)
ciA_low  <- max(theta_A - z * se_A, .Machine$double.eps)
ciA_high <- theta_A + z * se_A

theta_B  <- unname(fitB_nb$theta)
se_B     <- unname(fitB_nb$SE.theta)
ciB_low  <- max(theta_B - z * se_B, .Machine$double.eps)
ciB_high <- theta_B + z * se_B

ll_A_p <- as.numeric(logLik(fitA_pois)); ll_A_nb <- as.numeric(logLik(fitA_nb))
ll_B_p <- as.numeric(logLik(fitB_pois)); ll_B_nb <- as.numeric(logLik(fitB_nb))

aic_A_p <- AIC(fitA_pois); aic_A_nb <- AIC(fitA_nb)
aic_B_p <- AIC(fitB_pois); aic_B_nb <- AIC(fitB_nb)

# LR tests
lr_A  <- 2 * (ll_A_nb - ll_A_p)
p_A   <- pchisq(lr_A, df = 1, lower.tail = FALSE)

lr_B  <- 2 * (ll_B_nb - ll_B_p)
p_B   <- pchisq(lr_B, df = 1, lower.tail = FALSE)

fit_tab <- data.frame(
  dataset = c("Poisson truth", "NB truth"),
  theta_true = c(Inf, theta_true),
  theta_hat  = c(theta_A, theta_B),
  theta_SE   = c(se_A, se_B),
  theta_CI95_lower = c(ciA_low, ciB_low),
  theta_CI95_upper = c(ciA_high, ciB_high),
  logLik_Pois = c(ll_A_p, ll_B_p),
  logLik_NB   = c(ll_A_nb, ll_B_nb),
  AIC_Pois    = c(aic_A_p, aic_B_p),
  AIC_NB      = c(aic_A_nb, aic_B_nb),
  LR_stat_Pois_vs_NB = c(lr_A, lr_B),
  LR_df = c(1L, 1L),
  LR_p_value = c(p_A, p_B)
)

kable(fit_tab, digits = 3)
dataset theta_true theta_hat theta_SE theta_CI95_lower theta_CI95_upper logLik_Pois logLik_NB AIC_Pois AIC_NB LR_stat_Pois_vs_NB LR_df LR_p_value
Poisson truth Inf 259.67 828.332 0.000 1883.172 -6339.383 -6339.333 12686.77 12688.67 0.100 1 0.751
NB truth 2 2.11 0.130 1.855 2.365 -7196.580 -6827.266 14401.16 13664.53 738.629 1 0.000

How to read theta: In this NB2 parameterization, Var(Y) = μ + μ^2 / theta. When theta is very large, the extra term μ^2 / theta is small, so the variance is close to μ, which is the Poisson variance. When theta is small, the extra term is large, which implies strong overdispersion relative to Poisson.

Interpretation:

  • Poisson truth: The variance to mean ratio is near 1. The NB fit should report a large theta, often with a very wide interval, which indicates little evidence of extra dispersion. The likelihood ratio test should not reject Poisson, and AIC should be similar or favor Poisson.
  • NB truth, theta equals 2: The variance to mean ratio is clearly greater than 1. The NB fit should estimate theta near 2 subject to sampling error and the 95 percent interval should include the true value for sufficiently large n. The likelihood ratio test should strongly favor NB and AIC should be lower for NB.

7.5.2.3 Model Comparison: Poisson vs. Negative Binomial

7.5.2.3.1 Checking Overdispersion in Poisson Model

Before using NB regression, we confirm overdispersion by computing:

\[ \hat{\phi} = \frac{\text{deviance}}{\text{degrees of freedom}} \]

# Overdispersion check for Poisson model
Poisson_Mod$deviance / Poisson_Mod$df.residual
#> [1] 1.797988
  • If \(\hat{\phi} > 1\), overdispersion is present.

  • A large value suggests that Poisson regression underestimates variance.

7.5.2.3.2 Likelihood Ratio Test: Poisson vs. Negative Binomial

We compare the Poisson and Negative Binomial models using a likelihood ratio test, where: \[ G^2 = 2 \times ( \log L_{NB} - \log L_{Poisson}) \] with \(\text{df} = 1\)

# Likelihood ratio test between Poisson and Negative Binomial
pchisq(2 * (logLik(NegBinom_Mod) - logLik(Poisson_Mod)),
       df = 1,
       lower.tail = FALSE)
#> 'log Lik.' 4.391728e-41 (df=7)
  • Small p-value (< 0.05) → Negative Binomial model is significantly better.

  • Large p-value (> 0.05) → Poisson model is adequate.

Since overdispersion is confirmed, the Negative Binomial model is preferred.

7.5.2.4 Predictions and Rate Ratios

In Negative Binomial regression, exponentiating the coefficients gives rate ratios:

# Convert coefficients to rate ratios
data.frame(`Odds Ratios` = exp(coef(NegBinom_Mod)))
#>                 Odds.Ratios
#> (Intercept)       1.2919388
#> SexWomen          0.8053982
#> MarriedMarried    1.1624030
#> Num_Kid5          0.8382698
#> PhD_Quality       1.0153884
#> Num_MentArticle   1.0295094

A rate ratio of:

  • > 1 \(\to\) Increases expected article count.

  • < 1 \(\to\) Decreases expected article count.

  • = 1 \(\to\) No effect.

For example:

  • If PhD_Quality has an exponentiated coefficient of 1.5, individuals from higher-quality PhD programs are expected to publish 50% more articles.

  • If Sex has an exponentiated coefficient of 0.8, females publish 20% fewer articles than males, all else equal.

7.5.2.5 Alternative Approach: Zero-Inflated Models

If a dataset has excess zeros (many individuals publish no articles), Zero-Inflated Negative Binomial (ZINB) models may be required.

\[ \text{P}(Y_i = 0) = p + (1 - p) f(Y_i = 0 | \mu, \theta) \]

where:

  • \(p\) is the probability of always being a zero (e.g., inactive researchers).

  • \(f(Y_i)\) follows the Negative Binomial distribution.

7.5.3 Fitting a Zero-Inflated Negative Binomial Model

# Load package for zero-inflated models
library(pscl)

# Fit ZINB model
ZINB_Mod <- zeroinfl(Num_Article ~ ., data = bioChemists, dist = "negbin")

# Model summary
summary(ZINB_Mod)
#> 
#> Call:
#> zeroinfl(formula = Num_Article ~ ., data = bioChemists, dist = "negbin")
#> 
#> Pearson residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.2942 -0.7601 -0.2909  0.4448  6.4155 
#> 
#> Count model coefficients (negbin with log link):
#>                   Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)      0.4167465  0.1435966   2.902  0.00371 ** 
#> SexWomen        -0.1955068  0.0755926  -2.586  0.00970 ** 
#> MarriedMarried   0.0975826  0.0844520   1.155  0.24789    
#> Num_Kid5        -0.1517325  0.0542061  -2.799  0.00512 ** 
#> PhD_Quality     -0.0007001  0.0362697  -0.019  0.98460    
#> Num_MentArticle  0.0247862  0.0034927   7.097 1.28e-12 ***
#> Log(theta)       0.9763565  0.1354695   7.207 5.71e-13 ***
#> 
#> Zero-inflation model coefficients (binomial with logit link):
#>                 Estimate Std. Error z value Pr(>|z|)   
#> (Intercept)     -0.19169    1.32282  -0.145  0.88478   
#> SexWomen         0.63593    0.84892   0.749  0.45379   
#> MarriedMarried  -1.49947    0.93867  -1.597  0.11017   
#> Num_Kid5         0.62843    0.44278   1.419  0.15582   
#> PhD_Quality     -0.03771    0.30801  -0.122  0.90254   
#> Num_MentArticle -0.88229    0.31623  -2.790  0.00527 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
#> 
#> Theta = 2.6548 
#> Number of iterations in BFGS optimization: 43 
#> Log-likelihood: -1550 on 13 Df

This model accounts for:

  • Structural zero inflation.

  • Overdispersion.

ZINB is often preferred when many observations are zero. However, since ZINB does not fall under the GLM framework, we will discuss it further in Nonlinear and Generalized Linear Mixed Models.

Why ZINB is Not a GLM?

  • Unlike GLMs, which assume a single response distribution from the exponential family, ZINB is a mixture model with two components:

    • Count model – A negative binomial regression for the main count process.

    • Inflation model – A logistic regression for excess zeros.

Because ZINB combines two distinct processes rather than using a single exponential family distribution, it does not fit within the standard GLM framework.

ZINB is part of finite mixture models and is sometimes considered within generalized linear mixed models (GLMMs) or semi-parametric models.