This note will give a brief derivation of Stirling’s approximation. This bound is often useful for factorials.
$$ x! \approx x^{x}e^{-x}\sqrt{ 2\pi x } \iff \ln x! \approx x\ln x - x + \frac{1}{2} \ln(2\pi x) $$This proof (more like an interesting justification). is taken from page 2 of (MacKay 2003).
Let’s start with a Poisson distribution with mean $\lambda$
$$ P(r \mid \lambda) = \frac{e^{-\lambda}\lambda^{r}}{r!} $$If $\lambda$ is large and $r \approx \lambda$, this distribution is approximated by a Gaussian distribution (it is often referred as a discrete Gaussian see Poisson processes). Assuming $r \approx \lambda$ we have
$$ e^{-\lambda} \frac{\lambda^{\lambda}}{\lambda!} \approx \frac{1}{\sqrt{ 2\pi \lambda }} \implies \lambda! \approx \lambda^{\lambda}e^{-\lambda}\sqrt{ 2\pi \lambda } $$Which finishes the derivation of the approximation.
Approximation of the binomial
A quick derivation with the Stirling’s approximation gives a nice approximation for log of the binomials
$$ \ln \binom{N}{r} \equiv \ln \frac{N!}{(N - r)! r!} \approx \ln \frac{N^{N} \sqrt{ 2\pi N }}{(N - r)^{N - r} \sqrt{ 2\pi (N - r) } r^{r} \sqrt{ 2\pi r }} = $$ $$ = N\ln N + \frac{1}{2}\ln(2\pi N) - (N - r)\ln(N - r) - r\ln r - \frac{1}{2}\ln(4\pi^{2}(N - r)r) = $$ $$ = (N - r) \ln\left( \frac{N}{N - r} \right) + r \ln \left( \frac{N}{r} \right) + \frac{1}{2}\ln\left( 2\pi N \frac{N-r}{N} \frac{r}{N} \right) $$We observe there is an approximation similar to the formula of entropy, by grouping the $N$. It’s like the binary cross entropy formula
$$ H_{2}(x) = x \ln\left( \frac{1}{x} \right) + (1- x)\ln\left( \frac{1}{1 - x} \right) $$with $x = \frac{r}{N}$, + another asymptotically constant approximation term (or $\sqrt{ n }$ if you look at the logits..
References
[1] MacKay “Information Theory, Inference and Learning Algorithms” Cambridge University Press 2003