This set of notes tries to fix what I haven’t learned in 2021 course in algebra. It’s about inner product spaces. A good online reference on the topic is wilkinson.
Definitions
Inner product space
We define the vector space $V$ to be a inner product space, if we define a inner product operator ($\langle \cdot, \cdot \rangle : V \times V \to R$) such that the following are valid:
- It is linear on both arguments: $$ \langle \alpha x_{1} + \beta x_{2}, y \rangle = \alpha \langle x_{1}, y \rangle + \beta \langle x_{2}, y \rangle $$
- It is a symmetric operator: $\langle x, y \rangle = \langle y, x \rangle$
- It is positive definite that is we have $\forall x \in V: \langle x, x \rangle \geq 0$ with equality only if $x = \boldsymbol{0}$
An example of such operator is the classical cosine distance which is just the angle, or euclidean distance. Also all $p-\text{norms}$ are inner products.
Orthogonal matrixes and vectors
We define two vectors $u, v$ to be orthogonal if $\lVert u \rVert = \lVert v \rVert = 1$ and if $\langle u, v \rangle = 0$ with respect to some inner product (clearly if we use euclidean distance, this is not much interesting, but it is for the angle).
We can extend this idea for matrixes: given a square matrix $n\times n$ $Q$, it is orthogonal if the following is valid:
$$ QQ^{T} = Q^{T}Q = \boldsymbol{1}_{n} $$Where bold 1 is the unit vector. By this definition and the uniqueness of the inverse we conclude that it is orthonormal if and only if $Q^{T} = Q^{-1}$. We observe that $q_{i}^{T}q_{j} = \delta_{ij}$ (see Kronecker Delta).
Only square matrices can have inverse
This easy to prove if we know this property of the trace: For all matrices $A, B$ (such that their product is a square matrix) we have that $\text{tr}(AB) = \text{tr}(BA)$, remember that the trace is just the sum of the diagonal:
$$ tr(\begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ \dots \\ a_{n 1} & \dots & \dots & a_{n n} \end{bmatrix}) = \sum_{i = 1}^{n} a_{ii} $$This is easy to prove, so we left it to the reader (Hint: just expand the product).
With this in mind we can prove that if $Q$ is a $n \times p$ matrix, it does not have an inverse by looking at its trace.
Projections
We say that a $n\times n$ matrix $P$ is a projection if it’s multiple application is idempotent (many other interesting things are idempotent, one thing is the http protocol). Intuitively, if we project twice, we have the same vector, this definition is saying that this is the only important property to define a projection.
$$ P^{2} = P $$Now let’s study its Image and Kernel. Let’s define $U = \text{Im}(P)$ and $V = \text{ Ker}(P)$, and say $P: W \to W$.
At the end, I don’t think this is definition is so useful because I usually don’t want the kind of applications from $n \times n$ usually it’s more a dimensionality reduction, useful for Principal Component Analysis.
Properties of projections
Every vector of the projection can be written as sum of kernel and image
It’s easy to prove if you keep in mind that $(I - P)w \in \text{Ker}(P)$. We can write
$$ w = Iw = (I - P)w + Pw $$Which is a vector in kernel and one in the image. This is a general property of the projections, it seems.
$(I - P)$ is a projection matrix
A curious thing is that the matrix $(I - P)$ is a projection matrix too! We can prove this easily observing that
$$ (I - P)^{2} = (I-P)^{T}(I-P) = (I-P) - (P- P^{2}) = I - P $$Another way is just observing that this projection is very similar to $P$ because they just switch the image and the kernel. $\text{Im}(I - P) = V = \text{Ker}(P)$ and $\text{Ker}(I - P) = U = \text{Im}(P)$
Orthogonal Subspaces
Let’s take a inner product space $W$ and a subspace $U$ we define the orthogonal subspace $U^{\perp}$ to be
$$ U^{\perp} = \left\{ w \in W \mid \langle w, u \rangle = 0, \forall u \in U \right\} $$This is important especially for the following theorem
Orthogonal subspace decomposition
We assert that given $w \in W$ there are $u \in U \cap u' \in U^{\perp}$ such that $w = u + u'$. Written in another way we have
$$ \text{dim}(U) + \text{ dim}(U^{\perp}) = \text{dim}(W) = n $$Which is very similar to the kernel and image decomposition in Applicazioni lineari. (And you also have the motivation up there). This is trivial to prove when you know how to build the linear application from $W \text{ to } U$. Let’s build it in this way: Consider $B = \left\{ v_{1}, \dots, v_{n} \right\}$ to be the basis for $W$, then build $A$ with every row, the coordinates of the vector in $U$. When we have that $U^{\perp} = \left\{ x =(x_{1}, \dots, x_{n} \mid Ax = 0) \right\}$ which is just the Kernel of the application, using the dimensionality theorem we finish.
Orthogonal projections
Def: Projection
Given a inner product space $W$ with a subspace $U$, (so the operator $\langle \cdot, \cdot \rangle$ is defined and has good properties) then the orthogonal projection of $w \in W$ is the $u \in U$ such that
$$ \text{ minimizes the value } \lVert w - u \rVert $$Which means we want to approximate the value of that vector well. We observe that the perpendicular vectors in this space is exactly the kernel of a possible projection! We have proven before that every vector is just a simple, not weighted sum of kernel and image. i.e. if $P$ is the projection matrix, with Kernel and Image defined as above, we can write every $w \in W$ as a sum $u + v$ such that one is from the kernel and the other is from the image. We also notice that
The projection matrix
We can explicitly build the projection matrix by knowing the basis for $U$. We have a quick derivation, that we leave as exercise. But the solution is
$$ P_{u} = A(A^{T}A)^{-1}A^{T} $$With $A$ the matrix with the coordinates in $W$ of the basis of $U$ on each column. The proof is not so difficult. But it has close connections with the MSE error and least squares estimation explained in Minimi quadrati and Linear Regression methods.