2.4 The likelihood principle

The likelihood principle states that in making inferences or decisions about the state of nature, all the relevant experimental information is given by the likelihood function. The Bayesian framework follows this statement, i.e., it is conditional on observed data.

We follow J. Berger (1993), who in turn followed D. V. Lindley and Phillips (1976), to illustrate the likelihood principle. We are given a coin and are interested in the probability, \(\theta\), of it landing heads when flipped. We wish to test \(H_0: \theta = 1/2\) versus \(H_1: \theta > 1/2\). An experiment is conducted by flipping the coin (independently) in a series of trials, with the result being the observation of 9 heads and 3 tails.

This is not yet enough information to specify \(p(y|\theta)\), since the series of trials has not been explained. Two possibilities arise:

The experiment consisted of a predetermined 12 flips, so that \(Y = [ \text{Heads} ]\) follows a \(B(12, \theta)\) distribution. In this case,

\[ p_1(y|\theta) = \binom{12}{y} \theta^y (1 - \theta)^{12 - y} = 220 \times \theta^9 (1 - \theta)^3. \]
The experiment consisted of flipping the coin until 3 tails were observed (\(r = 3\)). In this case, \(Y\), the number of heads (failures) before obtaining 3 tails, follows a \(NB(3, 1 - \theta)\) distribution. Here,

\[ p_2(y|\theta) = \binom{y + r - 1}{r - 1} (1 - (1 - \theta)^y)(1 - \theta)^r = 55 \times \theta^9 (1 - \theta)^3. \]

Using a Frequentist approach, the significance level of \(y=9\) using the Binomial model against \(\theta=1/2\) would be:

\[ \alpha_1=P_{1/2}(Y\geq 9)=p_1(9|1/2)+p_1(10|1/2)+p_1(11|1/2)+p_1(12|1/2)=0.073. \]

# Binomial test: one-sided significance level for observing 9 or more successes in 12 trials

# Parameters
successes <- 9         # Number of observed successes
n_trials <- 12         # Total number of trials
p_null <- 0.5          # Null hypothesis success probability

# Calculate one-sided significance level
significance_level <- sum(dbinom(successes:n_trials, size = n_trials, prob = p_null))

# Output result with context
message(sprintf("One-sided significance level (P(X ≥ %d | H0: p = %.1f)): %.4f",                 successes, p_null, significance_level))

## One-sided significance level (P(X ≥ 9 | H0: p = 0.5)): 0.0730

For the Negative Binomial model, the significance level would be:

\[ \alpha_2=P_{1/2}(Y\geq 9)=p_2(9|1/2)+p_2(10|1/2)+\ldots=0.0327. \]

# Negative binomial test: Probability of observing at least 3 tails before 9 heads (failures)

# Parameters
target_successes <- 3      # Number of target successes (e.g., tails)
num_failures <- 9          # Number of failures (e.g., heads)
p_success <- 0.5           # Probability of success (e.g., tails)

# Compute the one-sided significance level: P(X ≥ 3 successes before 9 failures)
significance_level <- 1 - pnbinom(q = num_failures - 1,
                                  size = target_successes,
                                  prob = p_success)

# Print result
message(sprintf("P(at least %d tails before %d heads): %.4f",
                target_successes, num_failures, significance_level))

## P(at least 3 tails before 9 heads): 0.0327

We arrive at different conclusions using a significance level of 5%, whereas we obtain the same outcomes using a Bayesian approach because the kernels of both distributions are identical (\(\theta^9 \times (1 - \theta)^3\)).

References

Berger, J. 1993. Statistical Decision Theory and Bayesian Analysis. Third Edition. Springer.

Lindley, D. V., and L. D. Phillips. 1976. “Inference for a Bernoulli Process (a Bayesian View).” American Statistician 30: 112–19.