Central Limit Theorem and Law of Large Numbers

Bounds

Markov Bound

$$ P(X \geq y) \leq \frac{E[X]}{y} $$$$ yP(X \geq y) = y\int _{x =y}^{+\infty} f(x) \, dx \leq \int _{x=y}^{+\infty} x f(x) \, d \leq \int _{-\infty}^{+\infty}xf(x) \, d = E[X] $$

Il che finisce la dimostrazione.

Chebychev Bound

$$ P(\lvert x - E[X] \rvert \geq y) \leq \frac{\sigma^{2}}{y^{2}} $$

E in pratica dice che all’infinito viene tutto compattata sul valore atteso La dimostrazione è abbastanza semplice, si sostituisce $(x - E[X])^{2}$ su $X$ di Markov e $\varepsilon^{2}$ a $y$ e poi si dovrebbe già avere il risultato

Chernoff Bound

$$ P(Z \geq t) \leq \inf_{s > 0} e^{-st} M_{Z}(s) = \inf_{s > 0} e^{-st} E[e^{sZ}] $$

Moments of random variable

$$ M_{X}(\lambda) = E[\exp(\lambda X)] $$$$ e^{tX} = 1 + tX + \frac{t^{2}X^{2}}{2!} + \frac{t^{3}X^{3}}{3!} + \dots $$

Quindi per esempio se volessimo il primo momento, prendiamo la derivata rispetto a $t$e settiamo $t=0$, perché la cosa molto bella è che i coefficienti si cancellano tutti, e l’unico termine che rimane senza $t$ è il momento cercato, per questo motivo estraiamo easy i momenti.

Dimostrazione Chernoff’s Bound

$$ P(Z \geq t) \leq \inf_{s > 0} e^{-st} M_{Z}(s) = \inf_{s > 0} e^{-st} E[e^{sZ}] $$

Guardandolo dall’altro in basso non ho idea del perché valga.

$$ P(Z \geq t) = P(e^{sZ} \geq e^{st}) \leq \frac{E[e^{sZ}]}{e^{st}} $$

Dove $s$ è qualunque $s > 0$ perché per quello la funzione resta crescente, e quindi la dimostrazione vale ancora. La cosa interessante di questo bound è che la probabilità che succeda scende in modo esponenziale.

Hoeffding’s Inequality

$$ P(\lvert S_{n} - \mathbf{E}[S_{n}] \rvert \geq t) \leq 2e^{-2t^{2}/\sum(b_{i} - a_{i})^{2} } $$

Questo ci dice quanto velocemente la media converge nel valore atteso che ci aspettiamo per la legge dei grandi numeri.

Simpler form

$$ P(\lvert S_{n} - \mathbf{E}[S_{n}] \rvert \geq t) \leq 2e^{-2Nt^{2}/C^{2} } $$

Ci permette di avere un bound su accuratezza in funzione del numero di samples che andiamo a prendere.

La dimostrazione di questo mi sembra abbastanza tecnica, c’è bisogno di guardare https://web.eecs.umich.edu/~cscott/past_courses/eecs598w14/notes/03_hoeffding.pdf Oppure https://cs229.stanford.edu/extra-notes/hoeffding.pdf.

Non ho bene capito l’utilità se non nel caso Bernoulliano in cui sembra si semplifichi abbastanza questo.

This inequality has been proven quite useful. See Provably Approximately Correct Learning, Tabular Reinforcement Learning.

Hoeffding’s Lemma

$$ \mathbb{E}[e^{sX}] \leq \exp \left( {\frac{s^{2}(b - a)^{2}}{8}} \right) $$

Proof of Hoeffding’s Inequality

TODO.

Law of Large numbers

Weak Law

La dimostrazione di questo è molto semplice, basta avere Chebicheff

Questa è l’intuizione di quanto presente nell WLLN Central Limit Theorem and Law of Large Numbers-20240127132749421

Abbiamo mean square convergence.

Abbiamo che vale:

$$ \mathbb{P}\left( \left( \frac{S_{n} - n\bar{X}}{n} \right)^{2} > y \right) \leq \frac{\sigma^{2}_{X}}{ny} $$

E poi settando $y = \varepsilon^{2}$ si può avere il risultato. Nella forma corretta. Vedere capitolo 1.5 in questo.

$$ \lim_{ n \to \infty } E \left[ \left( \frac{S_{n}}{n} - \bar{X} \right)^{2} \right] = 0 $$

In questo senso possiamo dire che la successione $S_{n}$ arriverà sempre alla media.

Ricordiamo che $S_{n} = X_{1} + X_{2} + \dots + X_{n}$. Dove tutte le variabili $X_{i}$ sono IID con media $\bar{X}$ e varianza $\sigma^{2}$.

Weak law without finite variance

$$ \lim_{ n \to \infty } \mathbb{P}\left( \mid \frac{S_{n}}{n} - E[X]\mid > \varepsilon \right) = 0 $$

Teorema 1.5.3 nelle note. Questa è la convergenza in probabilità della serie di variabili aleatorie. Con strumenti più avanzati è possibile dimostrare anche la convergenza con probabilità 1 anche in casi di varianza infinita. Questa è l’unica differenza con la versione Strong della legge dei grandi numeri.

Convergence types

We need a lot of care when defining the notions of convergence for probability distributions because the same ideas that applied to the real numbers do not apply here anymore. For example, in calculus if we have a sequence $x_{1},\dots,x_{n}\dots$ such that $\forall i \in \mathbb{N}, x_{i} = x$ for a certain $x$ it’s clear that $\lim_{ n \to \infty }x_{n} = x$, by that definition. The same does not apply if we take a sequence of normally distributed random values $X_{1}, \dots, X_{n}\dots$, this does not converge to the distribution $X \sim\mathcal{N}(0, 1)$ in the classical sense, because we can’t directly compare them. For instance $P(X_{n} = X) = 0$ because both are continuous random variables, and the probability of it is always zero for the integral has zero length. This is why it’s important to distinguish various types of convergence in probability.

Convergence in distribution

$$ \lim_{ n \to \infty } F_{Z_{n}}(z) = F_{Z}(z) $$$$ Z_{n} = \frac{S_{n} -n\bar{X}}{\sigma \sqrt{ n }} $$

Converge alla normale, 0, 1 gaussiana. Un altro esempio è la weak law of large numbers, in cui $\frac{S_{n}}{n}$ converge a $\bar{X}$.

Convergence in probability

$$ \lim_{ n \to \infty } P(\mid Z_{n} - Z\mid > \varepsilon) = 0 $$

So we care about the value of the single probability. Vale anche qui l’esempio della WLLN.

Convergence in mean square

$$ \lim_{ n \to \infty } E[(Z_{n} - Z)^{2}] = 0 $$

La nota è che Mean Square -> Convergence probability -> Convergence in distribution. This is usually useful just to prove the convergence in probability.

Convergence almost everywhere

(Il prof. lo chiama with probability 1 e secondo lui serve sapere measure theory per poter comprendere la definizione originale).

Definiamo una sequenza $Z_{1}, Z_{2}, \dots$ e $\Omega$ il suo spazio campionatorio e sia $Z$ una altra variabile aleatoria, allora la sequenza converge con probabilità 1 se vale

$$ \mathbb{P}(\{\omega \in \Omega : \lim_{ n \to \infty }Z_{n}(\omega) = Z(\omega) \}) = 1 $$

Ossia, per definizione di variabile aleatoria $Z_{n}(\omega)$ è un valore reale, queste sequenze di numeri reali a volte convergono, se convergono vogliamo che il valore sia esattamente $Z(\omega)$. Quello che vogliamo dire con questo è che la probabilità degli elementi dello spazio campionatorio che creano sequenze che convergono è uguale a 1.

Central Limit Theorem

By Lindeberg-Lévy. It is possible to estimate the distribution of the average of the random variable.

The Bernoulli Case

The theorem

$$ \lim_{ n \to \infty } \left[ \mathbb{P}\left( \frac{S_{n} - n\bar{X}}{\sqrt{ n }\sigma} \leq y \right)\right] = \int_{-\infty}^{y} \frac{1}{\sqrt{ 2\pi }} \exp\left( -\frac{x^{2}}{2} \right) \, dx $$$$ Z_{i} = \frac{S_{i} - n\bar{X}}{\sqrt{i} \sigma} $$

Convergerà alla gaussiana normale. È un motivo per cui è una distribuzione molto importante.

$$ \sqrt{ n } (\bar{X} - \mu) \overset{d}\to \mathcal{N}(0, \sigma^{2}) $$

La cosa importante da ricordare è che per distribuzioni con varianza finita avremo una distribuzione normale con varianza zero e concentrata sulla media, mentre per distribuzioni con varianza infinita avremo una distribuzione anche lì con heavy-tail (che è il motivo per cui ha varianza infinita), il che significa che avremo una altra distribuzione con varianza infinita.

Bounds#

Markov Bound#

Chebychev Bound#

Chernoff Bound#

Moments of random variable#

Dimostrazione Chernoff’s Bound#

Hoeffding’s Inequality#

Simpler form#

Hoeffding’s Lemma#

Proof of Hoeffding’s Inequality#

Law of Large numbers#

Weak Law#

Weak law without finite variance#

Convergence types#

Convergence in distribution#

Convergence in probability#

Convergence in mean square#

Convergence almost everywhere#

Central Limit Theorem#

The Bernoulli Case#

The theorem#