Beta Binomial - Sufficiently Minimal

Bayesian Inference with Binary Data

The Beta-Binomial model is a classic Bayesian framework for working with dichotomous data. If we observe a Bernoulli process over a total of $n$ trials, where each trial has “success” probability $θ$ , then the total number of successes in $n$ trials $X$ is a binomial random variable where $X = 0, 1, 2, . . ., n$ .

$p (X = x | θ) = (\binom{n}{x}) θ^{x} (1 - θ)^{n - x}$

The binomial random variable is known as the likelihood of this model. Now, if there is uncertainty about the parameter $θ$ in the binomial model, we can treat $θ$ as an additional random variable that follows a $b e t a (α, β)$ distribution where $0 \leq θ \leq 1$ .

$p (θ) = \frac{Γ (α, β)}{Γ (α) Γ (β)} θ^{α - 1} (1 - θ)^{β - 1}$

The beta distribution is the prior for this model. Given we know $X = x$ successes occured over $n$ trials of the Bernoulli process with success probablility $θ$ , we can derive the posterior distribution for $θ$ , that is, the distribution of $θ$ that incorporates prior information and observed information, namely the quantity $X = x$ .

$p (θ | X = x) \propto θ^{x} (1 - θ)^{n - x} θ^{α - 1} (1 - θ)^{β - 1}$

$= θ^{x + α - 1} (1 - θ)^{n - x + β - 1}$

This expression has the form of the kernel of another beta distribution with parameters $α^{*} = x + α$ and $β^{*} = n - x + β$

To find the predictive distribution of $X$ , $f_{X} (x)$ , we first need the joint distribution of $X$ and $θ$ , $f_{X, θ} (x, θ)$ .

$f_{X, θ} (x, θ) = f_{θ} (θ) f_{X | θ} (x | θ) = \frac{Γ (α, β)}{Γ (α) Γ (β)} (\binom{n}{x}) θ^{α + x - 1} (1 - θ)^{β + n - x - 1}$

The prior predictive distribution $f_{X} (x)$ = $\int_{θ} f_{X, θ} (x, θ) d θ$ .

$\int_{θ} \frac{Γ (α, β)}{Γ (α) Γ (β)} (\binom{n}{x}) θ^{α + x - 1} (1 - θ)^{β + n - x - 1} d θ$ $= \frac{Γ (α, β)}{Γ (α) Γ (β)} (\binom{n}{x}) \frac{Γ (α + x) Γ (β + n - x)}{Γ (α + x, β + n - x)}$

$= (\binom{n}{k}) \frac{B (α + x, β + n - x)}{B (α, β)}$

The above expression is the probability density function of a beta-binomial random variable with parameters $n, α, β$ . This is the distribution of $X$ averaged over all possible values of $θ$ .

The last distribution of interest in this model is the posterior predictive distribution, of the predictive distribution of a new $X$ (call it $X^{*}$ ) an observation $X = x$ .

$f_{X^{*} | X = x} = \int_{θ} f (x^{*} | θ, x) f (θ | x) d θ$ Since $X^{*}$ and $X$ are conditionally independent given $θ$ ,

$= \int_{θ} f (x^{*} | θ) f (θ | x) d θ$

Following a similar derivation as that of the prior predictive, $X^{*} \sim b e t a b i n o m (n, x + α, n - x + β)$ .

Useful R Functions

The betaR package includes some useful functions for working with the beta binomial model. The package can be found at https://github.com/carter-allen/betaR, and installed using the following command.

devtools::install_github("carter-allen/betaR")

Example

library(betaR)

Consider the situation in which we are we are modeling a team’s number of wins over a $n = 12$ game season. The probability that the team wins any given game $θ$ is probably not a fixed value, with variability coming from the quality of the opposing team, home-court advantage, and countless other factors. If the team is relatively good though, we might expect their probability of winning any given game to be around $θ = 0.75$ , but not fixed at $θ = 0.75$ .

We can represent this prior belief by a beta distribtibution for $θ$ with parameters $α = 9$ and $β = 3$ . These parameters can be thought of as the number of prior wins and prior losses respectively, and would be a reasonable assumption if the team went 9-3 last season. The prior distribution of $θ$ is shown below.

dbeta_plot(alpha = 9, beta = 3)

Now if we wanted to make a prediction about how the team will do this season, we can form a prior predictive distribution of $X$ , that is, the number of games we expect the team to win this season, given their 9-3 record last season. If we model each game by a $b i n o m i a l (n = 12, p = θ)$ distribution and keep our $b e t a (α = 9, β = 3)$ prior for $θ$ , the prior predictive distribution of $X$ is a $b e t a b i n o m i a l (n = 12, α = 9, β = 3)$ . The prior predictive distribution of $X$ is shown below. Note the similarity in shape to the prior.

dbb_plot(n = 12, alpha = 9, beta = 3)

Let’s say that the season we are predicting ends up being much less successful than we expected, with the team going 4-8 over 12 games. Given this information, we can update our knowledge about $θ$ , the probability that this team wins any given game. $θ | X = x \sim b e t a (x + α, n - x + β) = b e t a (4 + 9, 12 - 4 + 3)$ , which has the following distribution.

dbeta_plot(alpha = 4+9,beta = 12-4+3)

Our belief about the probability this team wins any given game has become a bit less optimistic given their dissapointing last season.

Finally, lets take into account all of this information from the past two seasons to get a posterior predictive distribution of a new 12 game season. From earlier, $X^{*} \sim b e t a b i n o m (n, x + α, n - x + β) = b e t a b i n o m (12, 4 + 9, 12 - 4 + 3)$

dbb_plot(n = 12, alpha = 4+9, beta = 12-4+3)

Shiny App

The reactive app for visualizing the distributions discussed here under different parameter sets can be found at https://carter-allen.shinyapps.io/BetaBinomial/.