This note will give a brief derivation of Stirling's approximation. This bound is often useful for factorials.
This proof (more like an interesting justification). is taken from page 2 of (MacKay 2003).
Let's start with a Poisson distribution with mean
If is large and , this distribution is approximated by a Gaussian distribution (it is often referred as a discrete Gaussian see Poisson processes). Assuming we have
Which finishes the derivation of the approximation.
Approximation of the binomial
A quick derivation with the Stirling's approximation gives a nice approximation for log of the binomials
We observe there is an approximation similar to the formula of entropy, by grouping the . It's like the binary cross entropy formula
with , + another asymptotically constant approximation term (or if you look at the logits..
References
[1] MacKay “Information Theory, Inference and Learning Algorithms” Cambridge University Press 2003