This note briefly states and proves one of the most famous inequalities in geometry/analysis.

Theorem Statement

Given $2n$ real numbers (you can see these two also as $n$ dimensional vectors), such as $x_{1}, \dots, x_{n}$ and $y_{1}, \dots, y_{n}$ then we have that

$$ \left( \sum_{i = 1}^{n} x_{i}y_{i} \right) ^{2} \leq \left( \sum_{i= 1}^{n} x^{2}_{i} \right) \left( \sum_{i = 1}^{n} y^{2}_{i} \right) $$

In vectorial form we can rewrite this as

$$ \lvert \langle u, v \rangle \rvert ^{2} \leq \langle u, u \rangle \cdot \langle v, v \rangle $$

with $u = \left( x_{1}, \dots, x_{n} \right)$ and $v = \left( y_{1}, \dots, y_{n} \right)$ and the $\langle \cdot, \cdot \rangle$ operator is the inner product. We have equality if and only if $u$ and $v$ are linearly dependent (this one is easy to prove if seen from the vectorial view).

There are many possible proofs, but the easy one is just doing some counts:

Proof: direct counts

This is the most stupid proof possible, just expand every possible count!

From the hypothesis, we have that

$$ \sum_{i = 1}^{n}(x^{2}_{i}y^{2}_{i}) + \sum_{i \neq j}^{n} (x_{i}y_{i}x_{j}y_{j}) \leq \sum_{i = 1}^{n}(x^{2}_{i}y^{2}_{i}) + \sum_{i \neq j}^{n} (x_{i}^{2}y^{2}_{j}) $$

So now we just need to compare this:

$$ \sum_{i \neq j}^{n} (x_{i}y_{i}x_{j}y_{j}) = 2\sum_{i < j}^{n} (x_{i}y_{i}x_{j}y_{j}) \leq^{?} \sum_{i \neq j}^{n} (x_{i}^{2}y^{2}_{j}) $$

Now, if we take $i < j$ as we want, we notice that

$$ \left( x_{i}y_{j} - x_{j}y_{i} \right) ^{2} \geq 0 \implies x^{2}_{i}y^{2}_{j} + x^{2}_{j}y^{2}_{i} - 2x_{i}y_{i}x_{j}y_{j} \geq 0 \implies x^{2}_{i}y^{2}_{j} + x^{2}_{j}y^{2}_{i} \geq 2x_{i}y_{i}x_{j}y_{j} $$

And when you sum up for all the couples $(i, j)$ you get the above inequality. This also tells you why it’s easy to check if $x_{i} = \lambda y_{i}$ for some $\lambda \in \mathbb{R}$ then the above sum is 0, so we have equality.

Drawback: this proof works only on a specific inner product, general proof shouldn’t assume something about the product, but you get the gist of why it does work in this case anyway.

Proof: general approach

Let’s consider the function $f(t) = \lVert u - tv \rVert^{2}$, we observe that $\forall t \in \mathbb{R}$ we have that $f(t) \geq 0$. Now let’s expand this product

$$ f(t) = \langle u - tv, u - tv \rangle = \langle u, u \rangle - 2t \langle u, v \rangle + t^{2} \langle v, v \rangle = t^{2} \lVert v^{2} \rVert - 2 \langle u, v \rangle t + \lVert u^{2} \rVert $$

Now we use high school knowledge, as the function is always non-negative and it’s a simple $\mathbb{R} \to \mathbb{R}$ function, we know that the discriminant should be non-positive, which means $b^{2} - 4ac \leq 0$, let’s impose this condition on the function and we’ll have that:

$$ 4\lvert \langle u, v \rangle \rvert ^{2} - 4 \lVert v^{2} \rVert \lVert u^{2} \rVert \leq 0 \implies \lvert \langle u, v \rangle \rvert ^{2} \leq \langle u, u \rangle \cdot \langle v, v \rangle $$

Which ends the proof $\square$.

Applications

In this section I would like to list some known applications of this inequality. The inequality is a classical one, so one should expect many applications. I will try to expand this section when I see some during my studies.

Covariance-Variance relations

This is useful for the Rao-Cramer bound, studied in Parametric Models for example. Cauchy-Schwarz gives us an easy easy way to prove that given two random variables $X, Y$ then the following is true:

$$ Cov(X, Y)^{2} \leq Var(X) Var(Y) $$

This bound is usually used to calculate the correlation coefficient which is

$$\rho_{X, Y} = \frac{Cov(X, Y)}{\sqrt{ Var(X) Var(Y) }}$$

Proof: We can apply Cauchy-Schwarz in the context of expectation to get that

$$\mathbb{E}_{x, y}[XY]^{2} \leq \mathbb{E}_{x}[X^{2}]\mathbb{E}_{y}[Y^{2}]$$

In order to apply this, you just need to see that the expectation operator can be viewed as a inner product on the space of random variables, so let’s prove this first: Let’s consider a probability space $(\Omega, \mathcal{F}, \mathbb{P})$ with $\Omega$ the sample space, $\mathcal{F}$ the sigma algebra and $\mathbb{P}$ the probability measure on this space, let’s also define $X, Y$ to be random variables, i.e. functions from $\Omega$ to $\mathbb{R}$. In order to prove that the expectation operator is a inner product we need to check three conditions:

  1. Linearity
  2. Symmetry
  3. Positive definiteness They should be all trivial to verify, so we move on.

Now that we have this link, it’s easy to prove the original construct with covariance and variance: we just need to substitute the correct variables.