This set of notes tries to fix what I haven’t learned in 2021 course in algebra. It’s about inner product spaces. A good online reference on the topic is wilkinson.
Definitions
Inner product space
We define the vector space $V$ to be a inner product space, if we define a inner product operator ($\langle \cdot, \cdot \rangle : V \times V \to R$) such that the following are valid:
- It is linear on both arguments: $$ \langle \alpha x_{1} + \beta x_{2}, y \rangle = \alpha \langle x_{1}, y \rangle + \beta \langle x_{2}, y \rangle $$
- It is a symmetric operator: $\langle x, y \rangle = \langle y, x \rangle$
- It is positive definite that is we have $\forall x \in V: \langle x, x \rangle \geq 0$ with equality only if $x = \boldsymbol{0}$
An example of such operator is the classical cosine distance which is just the angle, or euclidean distance. Also all $p-\text{norms}$ are inner products.
Orthogonal matrixes and vectors
We define two vectors $u, v$ to be orthogonal if $\lVert u \rVert = \lVert v \rVert = 1$ and if $\langle u, v \rangle = 0$ with respect to some inner product (clearly if we use euclidean distance, this is not much interesting, but it is for the angle).
$$ QQ^{T} = Q^{T}Q = \boldsymbol{1}_{n} $$Where bold 1 is the unit vector. By this definition and the uniqueness of the inverse we conclude that it is orthonormal if and only if $Q^{T} = Q^{-1}$. We observe that $q_{i}^{T}q_{j} = \delta_{ij}$ (see Kronecker Delta).
Only square matrices can have inverse
$$ tr(\begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ \dots \\ a_{n 1} & \dots & \dots & a_{n n} \end{bmatrix}) = \sum_{i = 1}^{n} a_{ii} $$This is easy to prove, so we left it to the reader (Hint: just expand the product).
With this in mind we can prove that if $Q$ is a $n \times p$ matrix, it does not have an inverse by looking at its trace.
Projections
$$ P^{2} = P $$Now let’s study its Image and Kernel. Let’s define $U = \text{Im}(P)$ and $V = \text{ Ker}(P)$, and say $P: W \to W$.
At the end, I don’t think this is definition is so useful because I usually don’t want the kind of applications from $n \times n$ usually it’s more a dimensionality reduction, useful for Principal Component Analysis.
Properties of projections
Every vector of the projection can be written as sum of kernel and image
$$ w = Iw = (I - P)w + Pw $$Which is a vector in kernel and one in the image. This is a general property of the projections, it seems.
$(I - P)$ is a projection matrix
$$ (I - P)^{2} = (I-P)^{T}(I-P) = (I-P) - (P- P^{2}) = I - P $$Another way is just observing that this projection is very similar to $P$ because they just switch the image and the kernel. $\text{Im}(I - P) = V = \text{Ker}(P)$ and $\text{Ker}(I - P) = U = \text{Im}(P)$
Orthogonal Subspaces
$$ U^{\perp} = \left\{ w \in W \mid \langle w, u \rangle = 0, \forall u \in U \right\} $$This is important especially for the following theorem
Orthogonal subspace decomposition
$$ \text{dim}(U) + \text{ dim}(U^{\perp}) = \text{dim}(W) = n $$Which is very similar to the kernel and image decomposition in Applicazioni lineari. (And you also have the motivation up there). This is trivial to prove when you know how to build the linear application from $W \text{ to } U$. Let’s build it in this way: Consider $B = \left\{ v_{1}, \dots, v_{n} \right\}$ to be the basis for $W$, then build $A$ with every row, the coordinates of the vector in $U$. When we have that $U^{\perp} = \left\{ x =(x_{1}, \dots, x_{n} \mid Ax = 0) \right\}$ which is just the Kernel of the application, using the dimensionality theorem we finish.
Orthogonal projections
Def: Projection
$$ \text{ minimizes the value } \lVert w - u \rVert $$Which means we want to approximate the value of that vector well. We observe that the perpendicular vectors in this space is exactly the kernel of a possible projection! We have proven before that every vector is just a simple, not weighted sum of kernel and image. i.e. if $P$ is the projection matrix, with Kernel and Image defined as above, we can write every $w \in W$ as a sum $u + v$ such that one is from the kernel and the other is from the image. We also notice that
The projection matrix
$$ P_{u} = A(A^{T}A)^{-1}A^{T} $$With $A$ the matrix with the coordinates in $W$ of the basis of $U$ on each column. The proof is not so difficult. But it has close connections with the MSE error and least squares estimation explained in Minimi quadrati and Linear Regression methods.