In this post, we will quickly go through the math behind Bessel’s correction.




Bessel’s correction

First, let’s assume we have n independent observations from a population with mean and variance . The definition of population variance is:

Given the observation, we can estimate with the sample variance from textbook:

Bessel’s correction is the usage of instead of in the denominator for the sample variance. It’s unintuitive to think that is actually an unbiased estimation of :

Some useful identities

To prove (3), we need to prove a few more useful definitions, namely , , , , and . By the population definition, we have:

For the sample mean , we have expected value:

Similarly, for variance of sample mean:

Given (7) and (8), we have:


Given the above identities, proving (3) is straight forward. Let’s ignore the denominator for now:

Given (10), it’s not hard to see that: