Bayesian Predictive Posterior for Normal Distribution using SIR prior

Bayesian Predictive Posterior for a Normal Distribution using SIR Prior

Hello all,

My Bayesian statistics class had me compute the predictive posterior distribution of a normal distribution. You can do this by hand if you use the standard improper reference (SIR) prior. This was painfull to do on my own, and I thought it would be helpful to include this on my blog, in case anyone wants to compute this, and wants to see how I did it.

To put this into perspective, suppose that you sample $n$ times from some normally distributed population. $$ y \sim N(\mu, 1/\tau) $$ $\mu$ is the mean, and $\tau$ is the precision or inverse of the variance $1/\sigma^2$. We don't know what $\mu$ or $\tau$ are, but we can estimate what they could be. The posterior expressed as another distribution called the Posterior. The posterior is a balance between your bias and your data. If you don't trust your data or have few data, your prior distribution dominates. You are also able to place as little weight on your prior if you have no clue what $\mu$ and $\tau$ could be. In the case of a Normal distribution, you set the prior to be the SIR $$ p(\mu, \tau) = 1/\tau $$ What we will see is that when we do this, our posteriors for $\mu$ and $\tau$ resemble Normal and Gamma distributions respectively. $$ P(\tau|y) = gamma(\frac{n-1}{2},\frac{s^2 (n-1)}{2}) $$ In English, the probability knowing precision given sample set y. $$ P(\mu|\tau, y) = N(\bar{y}, \frac{1}{n\tau}) $$ In English, the probability of the mean provided that we know precision and sample set y.

Where $s$ is the sample standard deviation $s = \frac{1}{n-1}\sum_{i=1}^n (y_i - \bar{y})^2$ and $\bar{y} = \frac{1}{n}\sum_{i=1}^n y_i$.

After we did our experiment, we may want to know what future values may be. Baysean Statistics allows us to estimate the posterior for this as well. In stats language, we are interested in $$ p(y_{n+1}|y) $$ In English, this is the probability of observing a new sample $y_{n+1}$ given sample set y. Intuitively you may expect $y_{n+1}$ to simply be the mean $\mu$. However, the spread of this value is much more than $\tau$. By its nature, an unobserved value has more possibilities. Anything can happen in a Normal Distribution, but our prediction distribution is more likely for its value to be further from $\mu$. We will see that $p(y_{n+1}|y)$ is the t- distribution. Often textbooks or lecturers just leave it at that, but I wanted to prove that this is the case when using the SIR Prior.

Something That we should note before continuing is that we are making a critical assumption here: each sample of y is independent of the rest. For example, if you catch and release a fish, it's not going to go telling its friends to avoid worms on hooks. You catching a fish does not influence catching other fish.

Somthing else worth mentioning is that hereon, I will be using $f(y_{n+1}|y)$ to denote a continuous probability density function. Typically, statistics denotes discrete distributions as $p(y_{n+1}|y)$.

Prediction integral of a normal distribution using an SIR prior.

Deriving this function can be done with some helper hints from statistics. We will need five hints!

Helper hint 1: Posterior of the normal distribution with SIR prior

Suppose we have gathered $n$ data points from a normally distributed variable $y_i$ $i=1,2,...n$. Where $$y_i \sim N(\mu, 1/\tau) = \frac{\tau^{1/2}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}(y_i - \mu)^2}$$. Also each sample is independent. Notice we are using precision $\tau$ in substitution of the variance $\sigma^2$ with the relation $\tau = 1/\sigma^2$. We don't know what $\mu$ and $\tau$ is but we can claim that the distribution of these unknowns depends on the inverse of $\tau$. This is a special type of prior guess called the Standard Improper Prior $$f(\mu, \tau) = \frac{1}{\tau}I_{(0, \infty)}(\tau)$$ The $I_{(0, \infty)}(\tau)$ term is just a step function that sets the probability for negative precision to zero. Next define the liklihood function of $y$ (shorthand for the data set $y_1, y_2, ..., y_n$) as $$ f(y|\mu, \tau) = \Pi_{i=1}^n N(\mu, 1/\tau) = \frac{\tau^{n/2}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}\sum_{i=1}^n(y_i - \mu)^2} $$ We 'simplify' the exponential portion with the knoledge that $$ \sum_{i=1}^n(y_i - \mu)^2 = (n-1)s^2 + n(\bar{y} - \mu )^2 $$ Where $\bar{y} = \frac{1}{2}\sigma_{i=1}^n(y_i)$ and $(n-1)s = \sigma_{i=1}^n(y_i^2 - \bar{y})^2$. Our liklihood function now becomes $$ f(y|\mu, \tau) = \frac{\tau^{n/2}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}(n-1)s^2}e^{\frac{-\tau}{2}n(\bar{y} - \mu)^2} $$ The estimation for the posterior $f(\mu, \tau|y)$ uses Bayes theorem $$ f(\mu, \tau|y) = \frac{f(y|\mu, \tau)f(\mu, \tau)}{f(y)} = \frac{f(y|\mu, \tau)f(\mu, \tau)}{\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(y|\mu, \tau)f(\mu, \tau) d\mu d\tau} $$ The denominator is tricky, but we know it is finite. Statisticians aren't really keen on finding the bottom term.. Instead they use the fact that the denominator is a normalizing factor that makes $\int_{-\infty}^{\infty}f(\mu, \tau|y) = 1$. With this we can just worry about the numerator by saying the postior is porportional to the product of the liklihood and prior. $$ f(\mu, \tau|y) \propto f(y|\mu, \tau)f(\mu, \tau) = \frac{\tau^{n/2}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}(n-1)s^2}e^{\frac{-\tau}{2}n(\bar{y} - \mu)^2}\frac{1}{\tau}I_{(0, \infty)}(\tau) \\ =\frac{\tau^{n/2-1}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}(n-1)s^2}e^{\frac{-\tau}{2}n(\bar{y} - \mu)^2}I_{(0, \infty)}(\tau) $$

Helper hint 2: the marginal likelihood

We can begin setting up the equation for $f(y_{n+1}|y)$ by multiplying the liklihood of $y_{n+1}$ (just another normal distribution) with the posterior from hint 1. The conditional probability lets us write this out as $$ f(y_{n+1}|\mu,\tau)\cdot f(\mu, \tau|y) = f(y_{n+1},\mu,\tau|y) $$ using another helper (its shorter than the last one!)

$$ p(x) = \int p(x|\theta) \pi(\theta)d\theta $$ for liklikhood function $p(x|\theta)$ and prior function $\pi(\theta)$. In our case, $x=y_{n+1}$ and $\theta = \mu, \tau$ such that $$ f(y_{n+1}|y) = \int \int f(y_{n+1}|\mu,\tau)\cdot f(\mu, \tau|y) d\tau d\mu $$ Our integral is now $$ f(y_{n+1}|y) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \frac{\tau^{1/2}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}(y_{n+1} - \mu)^2} \cdot \frac{\tau^{n/2-1}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}(n-1)s^2}e^{\frac{-\tau}{2}n(\bar{y} - \mu)^2}I_{(0, \infty)}(\tau) d\tau d\mu $$ I know this looks scary, but we are doing statistics, which relies on 'explaining the integral' away with some definitions. The majority of the work we will do in the next section will involve rearranging terms untill they look nice enough to explain away.

Solving the big bad integral

First thing we can do is rewrite the tau integral such that $I_{(0, \infty)}(\tau)=1$ always. $$ f(y_{n+1}|y) = \int_{-\infty}^{\infty} \int_{0}^{\infty} \frac{\tau^{1/2}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}(y_{n+1} - \mu)^2} \cdot \frac{\tau^{n/2-1}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}(n-1)s^2}e^{\frac{-\tau}{2}n(\mu - \bar{y})^2} d\tau d\mu $$ Next lets enclose the integral with respect to $\mu$ around two exponentials and two $\tau^{1/2}$ terms. $$ f(y_{n+1}|y) = \int_{0}^{\infty} \tau^{(n-1)/2-1}e^{\frac{-\tau}{2}(n-1)s^2} \ \int_{-\infty}^{\infty} \ \frac{\tau^{1/2}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}(y_{n+1} - \mu)^2} \cdot \frac{\tau^{1/2}}{\sqrt{2\pi}} e^{\frac{-\tau}{2}n(\bar{y} - \mu)^2} d\mu d\tau $$ Here is what I did with the $\tau$ terms $$ \frac{\tau^{n/2-1}}{\sqrt{2\pi}}=\frac{\tau^{(n-1+1)/2-1}}{\sqrt{2\pi}}= \tau^{(n-1)/2-1}\frac{\tau^{1/2}}{\sqrt{2\pi}} $$

We will also rely on the fact that $(\bar{y} - \mu)^2 \equiv (\mu - \bar{y})^2$ provided $\mu, \bar{y} \in \mathbb{R}$

Helper hint 3: Predictive probability

Lets take a good look at the integral $$ \int_{-\infty}^{\infty} \ \frac{\tau^{1/2}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}(y_{n+1} - \mu)^2} \cdot \frac{\tau^{1/2}}{\sqrt{2\pi}} e^{\frac{-\tau}{2}n(\mu - \bar{y})^2} d\mu $$ Stats has a predictive probability density function $$ p(x_{new}|x) = \int p(x_{new}|\mu) \pi(\mu|x) d\mu $$ Our integral looks like this, where in our case $x_{new} = y_{n+1}$, $x = y$, $p(x_{new}|\mu) = \frac{\tau^{1/2}}{\sqrt{2\pi}}e^{\frac{\tau}{2}(y_i - \mu)^2}$, and $\pi(\mu|x) = \frac{\tau^{1/2}}{\sqrt{2\pi}} e^{\frac{\tau}{2}n(\mu - \bar{y})^2}$. Another dirty trick is that the integral of the product of two gaussians is another gaussian. $$ \int gaussian(\mu|\tau,y)\cdot gaussian(y_{n+1}|\mu,\tau) d\mu = gaussian(y_{n+1}|\tau, y) $$ At this point, I must remind you that $y_{n+1}$ is independent of $y$ (fish don't talk to each other), so that our conditional probability $gaussian(y_{n+1}|\tau, y) = gaussian(y_{n+1}|\tau)$. This helper hint makes our integral with respect to $\mu$

$$ \int_{-\infty}^{\infty} \ \frac{\tau^{1/2}}{\sqrt{2\pi}}e^{\frac{-\tau}{2}(y_{n+1} - \mu)^2} \cdot \frac{\tau^{1/2}}{\sqrt{2\pi}} e^{\frac{-\tau}{2}n(\mu - \bar{y})^2} d\mu = \frac{\tau^{1/2}}{\sqrt{2\pi}}e^{\frac{-n\tau}{2}(y_{n+1} - \bar{y})^2} $$

Back to our big integral

Knowing what we know know, our predictive density function is now $$ f(y_{n+1}|y) = \int_{0}^{\infty} \tau^{(n-1)/2-1}e^{\frac{-\tau}{2}(n-1)s^2} \ \frac{\tau^{1/2}}{\sqrt{2\pi}}e^{\frac{-n\tau}{2}(y_{n+1} - \bar{y})^2} d\tau $$ Lets bring the exponents together $$ f(y_{n+1}|y) = \int_{0}^{\infty} \frac{\tau^{n/2 - 1}}{\sqrt{2\pi}} \ e^{\frac{-\tau}{2}[(n-1)s^2+n(y_{n+1} - \bar{y})^2]} d\tau $$

Helpful hint 4. Gamma to the rescue!

Our predictive posterior looks like a gamma distribution...The gamma distribution function of random variable x has a pdf $$ gamma(a, b) = \frac{b^a}{\Gamma(a)}x^{a-1}e^{-bx} $$ Need I remind, the integral of gamma with respect to x = 1, we can use the expression $$ \int \frac{b^a}{\Gamma(a)}x^{a-1}e^{-bx}dx = 1 $$ To show $$ \int x^{a-1}e^{-bx}dx = \frac{\Gamma(a)}{b^a} \propto \frac{1}{b^a} $$

In our case, let $x=\tau$, $a=n/2$, and $b = (n-1)s^2+n(y_{n+1} - \bar{y})^2$ such that $$ f(y_{n+1}|y) \propto \frac{1}{[(n-1)s^2+n(y_{n+1} - \bar{y})^2]^{n/2}} $$

Helpful hint 5. Because every little t, is gonna be alright.

Rearranging the denominator terms

$$ \frac{1}{[(n-1)s^2+n(y_{n+1} - \bar{y})^2]^{n/2}} = \frac{1}{[1+\frac{n}{(n-1)s^2}(y_{n+1} - \bar{y})^2]^{n/2}} $$ $$ =\frac{1}{[1+\frac{1}{s^2(1+1/n)}(y_{n+1} - \bar{y})^2]^{n/2}} $$ $$ =\frac{1}{[1+(\frac{y_{n+1} - \bar{y}}{s\sqrt{1+1/n}})^2]^{n/2}} $$ let $t = \frac{y_{n+1} - \bar{y}}{s\sqrt{1+1/n}}$. We can see that this is a t-distribution $$ \frac{1}{[1+t^2]^{\frac{(n-1) + 1}{2}}} $$

$$ f(y_{n+1}|y) \sim t(n-1, \bar{y}, s\sqrt{1+1/n}) $$

Moving onwards

The $f(y_{n+1}|y)$ only applies if we are using the SIR prior. If you know a little bit about your Prior distribution and want to use a more informative prior, then the posterior distribution does not exist in closed form. To get meaningful estimates of $\mu$ and $\tau$, you artificially create a posterior distribution by using something called Monte Carlo Markov Chains. I'm learning about these right now and may have more to say about them in another blog. Thanks for reading!