The Spectral Theorem: A Theorem-Chain Construction
Strategic Overview
The spectral theorem is not one theorem but a family. Here we build the finite-dimensional versions — both for self-adjoint operators on real inner product spaces and normal operators on complex inner product spaces — and end by sketching the path to the infinite-dimensional (bounded / unbounded / compact) generalizations.
Two genuinely different proof routes are possible. The thing to remember is that the complex case is cleaner because the Fundamental Theorem of Algebra hands you an eigenvalue for free; the real case requires an extra trick (complexification or a quadratic-factor argument) to extract an eigenvalue from a self-adjoint operator.
| Route | Key Step | Works Over |
|---|---|---|
| Complex / Schur | Triangularize, then show normality forces diagonal | $\mathbb{C}$ |
| Real / Induction | Find one real eigenvalue, peel off, induct on $W^\perp$ | $\mathbb{R}$ |
Throughout, $V$ is a finite-dimensional inner product space over $\mathbb{F} \in \{\mathbb{R}, \mathbb{C}\}$ with $\dim V = n$, and $T \in \mathcal{L}(V)$ is a linear operator.
Foundation: Inner Product Spaces
See Inner product spaces.
Inner Product & ONB
An inner product is a map $\langle \cdot, \cdot \rangle : V \times V \to \mathbb{F}$ that is linear in one slot, conjugate-symmetric, and positive definite.
Every finite-dim inner product space admits an orthonormal basis via Gram-Schmidt. Key consequence: in an ONB $\{e_i\}$, coordinates are inner products: $x = \sum_i \langle x, e_i\rangle e_i$, and Parseval gives $\|x\|^2 = \sum_i |\langle x, e_i\rangle|^2$.
Riesz Representation (Finite-Dim)
Also see Counterfactual Invariance.
For every linear functional $\varphi : V \to \mathbb{F}$, there is a unique $u \in V$ such that $\varphi(x) = \langle x, u\rangle$ for all $x$.
This is the engine that gives us the adjoint. The proof is trivial in finite dimensions (just take $u = \sum_i \overline{\varphi(e_i)} e_i$), but morally the same theorem in Hilbert spaces requires completeness.
The Adjoint Operator
Definition of the Adjoint
$$\langle Tx, y\rangle = \langle x, T^* y\rangle \quad \forall x, y \in V.$$
Existence is by Riesz applied to $x \mapsto \langle Tx, y\rangle$ for each fixed $y$. In matrix terms (w.r.t. an ONB): $[T^*] = \overline{[T]^\top} = [T]^H$ (conjugate transpose).
Algebraic Properties of $T^*$
- $(S + T)^* = S^* + T^*$
- $(\lambda T)^* = \overline{\lambda}\, T^*$ (conjugate-linear!)
- $(ST)^* = T^* S^*$
- $T^{**} = T$
- $I^* = I$
[!note] The “Four Fundamental Subspaces” $\ker T = (\operatorname{im} T^*)^\perp$ and $\ker T^* = (\operatorname{im} T)^\perp$. This is the algebraic skeleton of the Fredholm alternative, and the reason SVD partitions $V$ and $W$ into four pieces.
Classes of Operators
The Operator Zoo
| Class | Definition | Spectrum Lives On | Matrix Form (some ONB) |
|---|---|---|---|
| Self-adjoint (Hermitian) | $T = T^*$ | $\mathbb{R}$ | Real diagonal |
| Skew-adjoint | $T = -T^*$ | $i\mathbb{R}$ | Purely imaginary diag. |
| Unitary ($\mathbb{C}$) / Orthogonal ($\mathbb{R}$) | $T^*T = I$ | Unit circle $S^1$ | Diagonal entries of modulus 1 |
| Normal | $T^*T = TT^*$ | $\mathbb{C}$ | Diagonal (in $\mathbb{C}$) |
| Positive (semi)definite | $T = T^*$ and $\langle Tx,x\rangle \geq 0$ | $\mathbb{R}_{\geq 0}$ | Non-negative diagonal |
[!note] Normality is the maximal class for which the spectral theorem holds. Self-adjoint, skew-adjoint, and unitary are all special cases of normal. The “if and only if” version of the complex spectral theorem characterizes exactly the normal operators.
Why Self-Adjoint Eigenvalues Are Real
$$\lambda \langle v, v\rangle = \langle Tv, v\rangle = \langle v, T^*v\rangle = \langle v, Tv\rangle = \overline{\lambda} \langle v, v\rangle.$$Hence $\lambda = \overline\lambda \in \mathbb{R}$. Same argument: normal $T$ satisfies $\|Tv\| = \|T^*v\|$ (computed via $\langle Tv,Tv\rangle = \langle T^*Tv, v\rangle = \langle TT^*v, v\rangle$).
Eigenvectors of Normal Operators Belong to Conjugate Eigenvalues of $T^*$
Lemma. If $T$ is normal and $Tv = \lambda v$, then $T^* v = \overline\lambda v$.
Proof. $T - \lambda I$ is normal, so $\|(T-\lambda I)v\| = \|(T-\lambda I)^* v\| = \|(T^* - \overline\lambda I)v\|$. The LHS is zero.
This is the lynchpin for the normal spectral theorem — it lets you upgrade triangular form to diagonal form. Without it Schur’s theorem is the end of the road.
Orthogonality of Eigenspaces
Lemma. If $T$ is normal, eigenvectors with distinct eigenvalues are orthogonal.
$$\lambda \langle v, w\rangle = \langle Tv, w\rangle = \langle v, T^*w\rangle = \langle v, \overline\mu w\rangle = \mu \langle v, w\rangle.$$Since $\lambda \neq \mu$, $\langle v,w\rangle = 0$.
Existence of Eigenvalues
Complex Case: Eigenvalues Always Exist
Over $\mathbb{C}$, the characteristic polynomial $\det(T - \lambda I)$ has a root by Fundamental Theorem of Algebra. So every operator on a complex finite-dim space has at least one eigenvalue. This is the only place characteristic polynomials are essential — and we even sidestep them in Axler-style treatments.
Real Case: Self-Adjoint Forces an Eigenvalue
Over $\mathbb{R}$ a generic operator may have no eigenvalues (e.g., 90° rotation in $\mathbb{R}^2$). But self-adjointness restores existence. Two standard proofs:
Proof 1 (Complexification). Embed $V \hookrightarrow V_\mathbb{C} := V \otimes_\mathbb{R} \mathbb{C}$, extend $T$ $\mathbb{C}$-linearly. The extension is still self-adjoint, has a complex eigenvalue, but by the previous argument the eigenvalue is real, and one extracts a real eigenvector.
$$\langle (T^2 + bT + cI) v, v\rangle = \|Tv\|^2 + b\langle Tv,v\rangle + c\|v\|^2 \geq \|Tv\|^2 - 2\sqrt{c}\|Tv\|\|v\| + c\|v\|^2 = \big(\|Tv\| - \sqrt{c}\|v\|\big)^2 \geq 0,$$strictly positive on $v \neq 0$ when $b^2 < 4c$. Hence $p$ must have a linear factor, giving a real eigenvalue.
[!tip] Variational route to existence The eigenvalue of largest absolute value can be obtained as $\lambda_1 = \max_{\|v\|=1} \langle Tv, v\rangle$ when $T$ is self-adjoint. Compactness of the unit sphere + continuity of the Rayleigh quotient does the work. This generalizes to compact self-adjoint operators on Hilbert spaces — see Courant-Fischer Min-Max.
The Invariant-Subspace Reduction
Invariant Subspaces
$W \subseteq V$ is $T$-invariant if $T(W) \subseteq W$. The restriction $T|_W \in \mathcal{L}(W)$ inherits operator structure.
The Key Reduction Lemma
Lemma. If $T$ is normal (or self-adjoint) and $W$ is $T$-invariant, then $W^\perp$ is also $T$-invariant. Moreover $T|_W$ is normal (self-adjoint).
Sketch (self-adjoint case). Take $u \in W^\perp$, $w \in W$. Then $\langle Tu, w\rangle = \langle u, Tw\rangle = 0$ since $Tw \in W$ and $u \perp W$.
This lemma is the inductive engine of the real spectral theorem: peel off one eigenvector at a time, restrict to its orthogonal complement, induct on dimension.
Schur’s Triangularization Theorem
We have seen something similar in past courses, but I don’t remember the name.
Schur Decomposition
Theorem (Schur). For every $T \in \mathcal{L}(V)$ on a complex finite-dim inner product space, there exists an ONB $\{e_1, \ldots, e_n\}$ such that $[T]$ is upper triangular.
Equivalently: $T = U \Lambda U^*$ where $U$ is unitary and $\Lambda$ is upper triangular. The diagonal of $\Lambda$ is the spectrum.
Proof (induction on $n$).
- By FTA, $T$ has an eigenvalue $\lambda$ with eigenvector $e_1$ (normalize).
- Let $W = \operatorname{span}(e_1)$. Then $W^\perp$ has dim $n-1$.
- $W^\perp$ is not in general $T$-invariant, but: define $\pi : V \to W^\perp$ the orthogonal projection and consider $S := \pi \circ T|_{W^\perp}$. By induction, $S$ is upper triangular in some ONB $\{e_2, \ldots, e_n\}$ of $W^\perp$.
- Stack $\{e_1, e_2, \ldots, e_n\}$; the matrix of $T$ is upper triangular.
[!note] Schur ≠ Spectral Schur works for every operator, but only gives triangular form. The leap to diagonal form needs normality.
The Spectral Theorems
Complex Spectral Theorem (Normal Operators)
Theorem. $T \in \mathcal{L}(V)$ on a complex finite-dim inner product space is normal iff there exists an ONB of $V$ consisting of eigenvectors of $T$.
Proof.
-
($\Leftarrow$) Trivial: in an eigen-ONB, $[T]$ is diagonal, and diagonal matrices commute with their conjugate transpose.
-
($\Rightarrow$) By Schur, write $[T] = U^* \Lambda U$ with $\Lambda$ upper triangular in ONB $\{e_i\}$. Normality is preserved by unitary conjugation, so $\Lambda$ itself is normal. Claim: a normal upper-triangular matrix is diagonal.
Compute $(\Lambda \Lambda^*)_{11} = \sum_j |\lambda_{1j}|^2$ but $(\Lambda^* \Lambda)_{11} = |\lambda_{11}|^2$. Equating forces $\lambda_{1j} = 0$ for $j > 1$. Induct.
Real Spectral Theorem (Self-Adjoint Operators)
Theorem. $T \in \mathcal{L}(V)$ on a real finite-dim inner product space is self-adjoint iff there exists an ONB of $V$ consisting of eigenvectors of $T$ (all eigenvalues real).
Proof (the “induction route”).
- By the existence theorem, $T$ has a real eigenvalue $\lambda_1$ with unit eigenvector $e_1$.
- $W = \operatorname{span}(e_1)$ is $T$-invariant. By the reduction lemma, $W^\perp$ is $T$-invariant and $T|_{W^\perp}$ is self-adjoint.
- Induct on dimension.
The converse is again trivial: a real diagonal matrix is symmetric.
Unified Statement (Functional Form)
$$T = \sum_{i=1}^k \lambda_i P_i,$$where $\lambda_1, \ldots, \lambda_k$ are the distinct eigenvalues, and $P_i$ is the orthogonal projection onto $\ker(T - \lambda_i I)$. The projections satisfy:
- $P_i^2 = P_i = P_i^*$ (orthogonal projection)
- $P_i P_j = 0$ for $i \neq j$
- $\sum_i P_i = I$ (resolution of identity)
This is the form that generalizes: in infinite dimensions the finite sum becomes a projection-valued measure $E$ on the spectrum, and $T = \int \lambda\, dE(\lambda)$.
Consequences and Extensions
Functional Calculus
$$f(T) := \sum_i f(\lambda_i) P_i.$$This is consistent with the polynomial calculus and gives meaning to $\sqrt T$ (for $T \succeq 0$), $e^T$, $\log T$, etc.
[!tip] Used everywhere in ML & physics Matrix exponential $e^{tA}$ for solving linear ODEs; $\sqrt{\Sigma}$ for whitening in PCA; $\log\det$ regularizers in Bayesian deep learning. All are silently invoking the spectral theorem.
Polar Decomposition
Every $T \in \mathcal{L}(V)$ factors as $T = U|T|$ where $|T| := \sqrt{T^*T}$ is positive (defined via functional calculus on the self-adjoint $T^*T$) and $U$ is a partial isometry (unitary if $T$ is invertible). This is the operator analog of $z = e^{i\theta}|z|$ for $z \in \mathbb{C}$.
Singular Value Decomposition
$$T = U \Sigma V^*,$$where $\Sigma$ has the singular values $\sigma_i = \sqrt{\lambda_i(T^*T)}$ on the diagonal. SVD is the spectral theorem in drag — it is what you get when you have no operator to diagonalize but you can diagonalize $T^*T$.
Courant–Fischer Min-Max Theorem
$$\lambda_k = \max_{\substack{S \subseteq V \\ \dim S = k}} \min_{\substack{v \in S \\ \|v\|=1}} \langle Tv, v\rangle.$$Equivalently, eigenvalues are critical values of the Rayleigh quotient $R(v) = \langle Tv,v\rangle / \|v\|^2$ on the unit sphere. This is the variational characterization that survives into infinite dimensions and underlies PCA, graph Laplacian methods, and quantum mechanics ground-state computations.
Simultaneous Diagonalization
Theorem. A family $\{T_\alpha\}$ of normal operators is simultaneously diagonalizable (common eigen-ONB) iff the operators pairwise commute.
This is the linear-algebraic shadow of why commuting observables in quantum mechanics share eigenbases. Non-commutativity ⟹ uncertainty.
Infinite-Dimensional Generalizations (Sketch)
Compact Self-Adjoint Operators
Hilbert-Schmidt / Spectral Theorem for Compact Self-Adjoint. A compact self-adjoint operator $T$ on a Hilbert space $H$ has a countable ONB of eigenvectors with real eigenvalues $\lambda_n \to 0$.
The finite-dim proof essentially transports: replace “max over unit sphere” with “max over unit ball” (compactness saves you), peel off the top eigenvector, induct on the closed invariant subspace. The accumulation at $0$ is forced by compactness.
Bounded Self-Adjoint Operators
Spectral Theorem (Bounded SA). For bounded self-adjoint $T$ on Hilbert space $H$, there exists a unique projection-valued measure $E$ on $\sigma(T) \subseteq \mathbb{R}$ such that $T = \int_{\sigma(T)} \lambda\, dE(\lambda)$.
Eigenvalues may not exist anymore (e.g., multiplication by $x$ on $L^2([0,1])$ has empty point spectrum). The continuous spectrum is absorbed into the integral via the PVM.
Unbounded Self-Adjoint Operators
Now domains matter. The right notion is essentially self-adjoint (closure equals adjoint). The spectral theorem still holds — this is what justifies quantum-mechanical observables like position and momentum even though they’re unbounded.
Connections & Synthesis
Why Normal Is The Right Class
The pattern is: diagonalizability in an ONB = commutation of the operator with its adjoint. The structural reason is that an eigen-ONB simultaneously diagonalizes $T$ and $T^*$, and any two matrices diagonal in the same basis commute. The converse — that commutation forces simultaneous diagonalization — is exactly the spectral theorem.
| Question | Required Property of $T$ |
|---|---|
| Has an eigenvalue? | Char poly has root (always over $\mathbb{C}$) |
| Triangularizable (unitarily)? | Always over $\mathbb{C}$ (Schur) |
| Diagonalizable (any basis)? | Minimal poly has distinct roots |
| Diagonalizable in ONB? | Normal (over $\mathbb{C}$) / self-adjoint (over $\mathbb{R}$) |
Connection to AI / ML Practice
The spectral theorem silently underpins:
- PCA: eigendecomposition of $\Sigma = \mathbb{E}[xx^\top]$, which is positive semi-definite.
- Graph Laplacian methods (spectral clustering, Cheeger inequality): $L = D - A$ is self-adjoint, eigenvectors of small eigenvalues encode community structure.
- Markov chain mixing: the second-largest eigenvalue of a reversible transition matrix governs mixing time. (Reversibility = self-adjointness w.r.t. the stationary measure inner product.)
- Hessian analysis in optimization / loss landscapes: eigenvalues of $\nabla^2 \mathcal{L}$ describe local curvature; negative eigenvalues = saddle directions.
- Kernel methods: a positive-definite kernel $k(x,y)$ corresponds to a self-adjoint operator on $L^2(\mu)$ — Mercer’s theorem is the compact self-adjoint spectral theorem in disguise.
[!note] Relation to multi-agent dynamics In a multi-agent setting where agents update beliefs via a doubly-stochastic communication matrix $W$, the second eigenvalue $\lambda_2(W)$ controls consensus speed. If $W$ is symmetric (undirected, equal-weight communication), the spectral theorem gives an orthogonal eigen-decomposition and clean convergence bounds. This connects to the broader pattern: whenever a multi-agent interaction has an underlying symmetric / reversible / detailed-balance structure, spectral methods give tight results; without symmetry one falls back to weaker tools (Jordan form, Perron-Frobenius, pseudo-spectra).
What’s Missing From This Picture
- Non-normal operators: enormous theory of Jordan Canonical Form, generalized eigenvectors, pseudo-spectra. Most operators in practice (transition matrices for non-reversible chains, transformer attention matrices) are non-normal, and the spectral theorem doesn’t apply — one needs different machinery.
- Operator algebras: the spectral theorem is the first chapter of $C^*$-algebra theory; the Gelfand-Naimark theorem generalizes spectral resolution to commutative $C^*$-algebras, identifying them with $C(X)$ for compact Hausdorff $X$.
- Spectral theory of differential operators: Sturm-Liouville theory, Weyl’s theorem, the connection to PDEs and quantum mechanics.