Beta and Dirichlet Distributions
The beta distribution The beta distribution is a powerful tool for modeling probabilities and proportions between 0 and 1. Here’s a structured intuition to grasp its essence: Core Concept The beta distribution, defined on $[0, 1]$, is parameterized by two shape parameters: α (alpha) and β (beta). These parameters dictate the distribution’s shape, allowing it to flexibly represent beliefs about probabilities, rates, or proportions. Key Intuitions a. “Pseudo-Counts” Interpretation α acts like “successes” and β like “failures” in a hypothetical experiment. Example: If you use Beta(5, 3), it’s as if you’ve observed 5 successes and 3 failures before seeing actual data. After observing x real successes and y real failures, the posterior becomes Beta(α+x, β+y). This makes beta the conjugate prior for the binomial distribution (bernoulli process). b. Shape Flexibility Uniform distribution: When α = β = 1, all values in [0, 1] are equally likely. Bell-shaped: When α, β > 1, the distribution peaks at mode = (α-1)/(α+β-2). Symmetric if α = β (e.g., Beta(5, 5) is centered at 0.5). U-shaped: When α, β < 1, density spikes at 0 and 1 (useful for modeling polarization, meaning we believe the model to only produce values at 0 or 1, not in the middle.). Skewed: If α > β, skewed toward 1; if β > α, skewed toward 0. c. Moments Mean: $α/(α+β)$ – your “expected” probability of success. Variance: $αβ / [(α+β)²(α+β+1)]$ – decreases as α and β grow (more confidence). $$ \text{Mode} = \frac{\alpha - 1}{\alpha + \beta - 2} $$The mathematical model $$ \text{Beta} (x \mid a, b) = \frac{1}{B(a, b)} \cdot x^{a -1 }(1 - x)^{b - 1} $$ Where $B(a, b) = \Gamma(a) \Gamma(b) / \Gamma( + b)$ And $\Gamma(t) = \int_{0}^{\infty}e^{-x}x^{t - 1} \, dx$ ...