This note will give a brief derivation of Stirling's approximation. This bound is often useful for factorials.

This proof (more like an interesting justification). is taken from page 2 of (MacKay 2003).

Let's start with a Poisson distribution with mean

If is large and , this distribution is approximated by a Gaussian distribution (it is often referred as a discrete Gaussian see Poisson processes). Assuming we have

Which finishes the derivation of the approximation.

Approximation of the binomial

A quick derivation with the Stirling's approximation gives a nice approximation for log of the binomials

We observe there is an approximation similar to the formula of entropy, by grouping the . It's like the binary cross entropy formula

with , + another asymptotically constant approximation term (or if you look at the logits..

References

[1] MacKay “Information Theory, Inference and Learning Algorithms” Cambridge University Press 2003