# Bayesian Normal Model

### Inference for unknown mean and known variance

For a sampling model $$Y|\mu, \sigma^2 \stackrel{iid}{\sim} N(\mu,\sigma^2)$$ we wish to make inference on $$\mu$$ assuming $$\sigma^2$$ is known. The first step will be to derive an expression for $$\pi(\mu|Y=y,\sigma^2)$$ which is proportional to $$\pi(\mu|\sigma^2)\pi(y|\mu,\sigma^2)$$. To following prior structure will be adopted:

$Y|\mu,\sigma^2 \sim N(\mu,\sigma^2)$

$\mu \sim N(\mu_0,\sigma_0^2)$

Now we can derive an expression for $$\pi(\mu|Y=y,\sigma^2) \propto \pi(\mu|\sigma^2)\pi(y|\mu,\sigma^2)$$.

$\pi(\mu|\sigma^2)\pi(y|\mu,\sigma^2) \propto e^{-\frac{1}{2\sigma_0^2}(\mu-\mu_o)^2} e^{-\frac{1}{2\sigma^2} (y-\mu)^2}$

$= e^{\frac{1}{2}(\frac{1}{\sigma_0^2} (\mu^2-2\mu \mu_0 + \mu_0^2)+\frac{1}{\sigma^2}(y^2-2\mu y+\mu^2))}$

Now let’s just work with the $$\frac{1}{\sigma_0^2} (\mu^2-2\mu \mu_0 + \mu_0^2)+\frac{1}{\sigma^2}(y^2-2\mu y+\mu^2)$$ by itself.

$\frac{1}{\sigma_0^2} (\mu^2-2\mu \mu_0 + \mu_0^2)+\frac{1}{\sigma^2}(y^2-2\mu y+\mu^2)= \frac{\mu^2}{\sigma^2_0}-\frac{2\mu \mu_0}{\sigma^2_0} + \frac{\mu_0^2}{\sigma_0^2} + \frac{y^2}{\sigma^2} - \frac{2\mu y}{\sigma^2} + \frac{\mu^2}{\sigma^2}$

$= \mu^2(\frac{1}{\sigma^2_0} + \frac{1}{\sigma^2}) - 2\mu(\frac{\mu_0}{\sigma_0^2} + \frac{y}{\sigma^2}) + \frac{\mu_0^2}{\sigma_0^2} + \frac{y^2}{\sigma^2}$

To get this final expression into a quadratic form we can let $$a = \frac{1}{\sigma^2_0} + \frac{1}{\sigma^2}$$, $$b = \frac{\mu_0}{\sigma_0^2} + \frac{y}{\sigma^2}$$, and $$c = \frac{\mu_0^2}{\sigma_0^2} + \frac{y^2}{\sigma^2}$$. In proportionality the $$c$$ term will drop from the exponent. Now we have

$\pi(\mu | \sigma^2,y) \propto e^{-\frac{1}{2}(a\mu^2-2b\mu)} = e^{-\frac{1}{2}a(\mu^2 - 2b\mu/a+b^2/a^2)+b^2/2a}$

$= e^{-\frac{1}{2}a(\mu - b/a)^2} = e^{-\frac{1}{2}(\frac{\mu-b/a}{1/\sqrt{a}})^2}$

Which is recognizable as the kernel of a $$N(b/a,1/a) = N(\mu_n = \frac{\frac{\mu_0}{\sigma_0^2} + \frac{y}{\sigma^2}}{\frac{1}{\sigma^2_0} + \frac{1}{\sigma^2}}, \sigma^2_n = \frac{1}{\frac{1}{\sigma^2_0} + \frac{1}{\sigma^2}})$$.

### Inference for unknown mean and unknown variance

It is more reasonable to expect that in an actual data analysis if we are trying to estimate the mean of a normal population then we likely need to estimate the variance as well. The following parameterization is used by Hoff in A First Course in Bayesian Statistical Methods.

$1/\sigma^2 \sim gamma(shape = \frac{\nu_0}{2},rate = \frac{\nu_0 \sigma^2_0}{2})$

$\mu|\sigma^2 \sim N(\mu_0,\frac{\sigma^2}{\kappa_0})$

$Y_1,...,Y_n|\mu,\sigma^2 \stackrel{iid}{\sim} N(\mu,\sigma^2)$

The posterior joint distribution of $$\mu,\sigma^2$$ after observing the data $$Y_1,...,Y_n$$ is given by the following product.

$\pi(\mu,\sigma^2|y_1,...,y_n) = \pi(\mu|\sigma^2,y_1,...,y_n)\pi(\sigma^2|y_1,...,y_n)$

This first term $$\pi(\mu|\sigma^2,y_1,...,y_n)$$ is the conditional posterior distribution of $$\mu$$. Since it is conditional on $$\sigma^2$$ (i.e. $$\sigma^2$$ is “known”) it can be derived with a calculation identical to the one given in the previous section. Generalizing to a sample of size $$n$$ instead of size 1 and substituting $$\sigma^2/\kappa_n$$ for $$\sigma^2_0$$ we get that

$\mu|y_1,...,y_n,\sigma^2 \sim N(\mu_n = \frac{\kappa_0 \mu_0 + n\bar{y}}{\kappa_n},\frac{\sigma^2}{\kappa_n})$

where $$\kappa_n = \kappa_0 + n$$. Hoff’s parameterization lends itself to a nice interpretation of $$\mu_0$$ and the mean of $$\kappa_0$$ prior observations and $$\mu_n$$ as the mean of $$\kappa_0$$ prior observations and $$n$$ new observations all together. Also, the variance of $$\mu|y_1,...,y_n,\sigma^2$$ is $$\sigma^2/\kappa_n$$.

Now to get the posterior distribution of $$\sigma^2$$ after observing $$y_1,...,y_n$$ we need to integrate over all possible values of $$\mu$$ according to

$\pi(\sigma^2|y_1,...,y_n) \propto \pi(\sigma^2) \pi(y_1,...,y_n|\sigma^2)$

$= \pi(\sigma^2) \int_\mu \pi(y_1,...,y_n|\mu,\sigma^2)\pi(\mu|\sigma^2) d\mu$

$= \pi(\sigma^2) \int_\mu (\frac{1}{2\pi \sigma^2})^{-n/2} e^{-\frac{\sum(y_i-\mu)^2}{2\sigma^2}} (\frac{1}{2\pi \sigma^2/\kappa_0})^{-1/2} e^{-\frac{(\mu-\mu_0)^2}{2\sigma^2/\kappa_0}}d\mu$

Eventually, we get that $$\sigma^2|y_1,...,y_n \sim IG(\nu_n/2,\nu_n \sigma^2_n/2)$$ or alternatively $$1/\sigma^2|y_1,...,y_n \sim Gamma(\nu_n/2,\nu_n\sigma^2_n/2)$$, where

$\nu_n = \nu_0 + n$

$\sigma^2_n = \frac{1}{\nu_n}[\nu_0 \sigma^2_0 + (n-1)s^2+\frac{\kappa_0n}{\kappa_n}(\bar{y}-\mu_0)^2]$

We get another nice interpretation here of $$\nu_0$$ as the prior sample size with corresponding sample variance $$\sigma^2_0$$. Also $$s^2$$ is the sample variance and $$(n-1)s^2$$ is the sample sum of squares.

### normR

# devtools::install_github("carter-allen/normR")
library(normR)

### Monte Carlo Sampling

Example from Hoff pg. 76.

# parameters for the normal prior
mu_0 <- 1.9
kappa_0 <- 1

# parameters for the gamma prior
sigsq_0 <- 0.010
nu_0 <- 1

# some data
ys <- c(1.64,1.70,1.72,1.74,1.82,1.82,1.82,1.90,2.08)
n <- length(ys)
y_bar <- mean(ys)
ssq <- var(ys)

# posterior parameters
kappa_n <- kappa_0 + n
nu_n <- nu_0 + n
mu_n <- mu_n(kappa_0,mu_0,n,y_bar)
sigsq_n <- sigsq_n(nu_0,sigsq_0,n,ssq,kappa_0,y_bar,mu_0)
mu_n
## [1] 1.814
sigsq_n
## [1] 0.015324
# posterior variance draws
sigsq_posterior_draw <- draw_sigsq_posterior(10000,nu_n,sigsq_n)
# posterior mean draws
mu_posterior_draw <- draw_mu_posterior(mu_n,sigsq_posterior_draw,kappa_n)
quantile(mu_posterior_draw,c(0.025,0.975))
##     2.5%    97.5%
## 1.727323 1.899154
# plot posterior draws
normR::plot_monte_sim(sigsq_posterior_draw,mu_posterior_draw)

# plot posterior density estimate for sigma
normR::plot_posterior_density_estimate(sigsq_posterior_draw,
plot_title = "Posterior density estimate of sigma")

# posterior density estimate of mu
normR::plot_posterior_density_estimate(mu_posterior_draw,
plot_title = "Posterior density estimate of mu")

### Shiny App

https://carter-allen.shinyapps.io/NormalModel/