Sobolev Spaces
Motivation & Setup
PDE theory and the calculus of variations require function spaces in which (i) differentiation makes sense for non-smooth functions, (ii) the space is complete under an $L^p$-flavored norm, and (iii) one can embed into $L^q$ or Hölder spaces. The classical $C^k$ spaces fail (ii); pure $L^p$ fails (i). Sobolev spaces $W^{k,p}$ are the fix: they replace pointwise differentiation with the weak derivative and complete $C^k$ under the natural $L^p$-Sobolev norm.
Throughout: $\Omega \subset \mathbb{R}^n$ open, $1 \le p \le \infty$, $k \in \mathbb{N}_0$. Use multi-index $\alpha = (\alpha_1,\dots,\alpha_n)$, $|\alpha| = \sum \alpha_i$, $D^\alpha = \partial_1^{\alpha_1}\cdots\partial_n^{\alpha_n}$.
Why Classical Spaces Fail
- $C^k(\bar\Omega)$ with $\|\cdot\|_{C^k}$ is complete but ignores $L^p$ structure; minimizing sequences of integral energies need not converge in $C^k$.
- $C^k(\bar\Omega)$ with the $L^p$-Sobolev norm is not complete; its completion is precisely $W^{k,p}$.
- The Dirichlet energy $E(u) = \tfrac12\int_\Omega|\nabla u|^2$ is minimized naturally over $H^1$, not over $C^1$.
[!note] The Direct Method The Direct Method of the Calculus of Variations needs: lower semicontinuity + coercivity + a reflexive space to extract weakly convergent minimizing subsequences. Sobolev spaces are the canonical reflexive choice (when $1
Weak Derivatives
Definition: Weak Derivative
$$ \int_\Omega u\, D^\alpha\varphi\,dx \;=\; (-1)^{|\alpha|}\int_\Omega v\,\varphi\,dx \qquad \forall \varphi \in C_c^\infty(\Omega). $$Notation: $v = D^\alpha u$. The definition is the distributional derivative restricted to those distributions representable by an $L^1_{\text{loc}}$ function.
Uniqueness: a.e. uniqueness follows from the fundamental lemma of the calculus of variations (du Bois-Reymond).
[!tip] Recognizing weak derivatives If $u \in C^k$, weak and classical derivatives coincide. If $u$ has a jump, the weak derivative does not exist as a function — it picks up a Dirac mass and lives only as a distribution. Example: $u = \mathbf{1}_{[0,\infty)}$ has $u' = \delta_0 \notin L^1_{\text{loc}}$.
Calculus Rules for Weak Derivatives
The following hold in $W^{k,p}$ provided RHS makes sense:
- Linearity: $D^\alpha(au+bv) = aD^\alpha u + bD^\alpha v$.
- Product rule (chain): if $u \in W^{1,p}$ and $\eta \in C_c^\infty$, then $\eta u \in W^{1,p}$ and $\partial_i(\eta u) = \eta \partial_i u + u\partial_i\eta$.
- Chain rule (Stampacchia): if $F \in C^1(\mathbb{R})$, $F' \in L^\infty$, $F(0)=0$, and $u\in W^{1,p}$, then $F(u)\in W^{1,p}$ with $\partial_i F(u) = F'(u)\partial_i u$.
- Truncation: $u^+ = \max(u,0) \in W^{1,p}$ with $\nabla u^+ = \nabla u \cdot \mathbf{1}_{\{u>0\}}$ (almost everywhere).
The truncation rule underlies the maximum principle for weak solutions.
The Spaces $W^{k,p}$ and $H^k$
Definition: Sobolev Space $W^{k,p}(\Omega)$
$$ W^{k,p}(\Omega) \;=\; \{\, u \in L^p(\Omega) : D^\alpha u \in L^p(\Omega) \text{ for all } |\alpha| \le k \,\}. $$$$ \|u\|_{W^{k,p}} \;=\; \Bigl(\sum_{|\alpha|\le k}\|D^\alpha u\|_{L^p}^p\Bigr)^{1/p} \quad (1\le p<\infty), \qquad \|u\|_{W^{k,\infty}} = \max_{|\alpha|\le k}\|D^\alpha u\|_{L^\infty}. $$Equivalent norms abound — e.g., $\|u\|_{L^p} + \sum_{|\alpha|=k}\|D^\alpha u\|_{L^p}$.
Definition: $H^k$ — the Hilbert Case
$$ \langle u,v\rangle_{H^k} \;=\; \sum_{|\alpha|\le k}\int_\Omega D^\alpha u\, D^\alpha v\,dx. $$This is the only $p$ giving a Hilbert space — it’s the workhorse for linear elliptic PDE and spectral theory.
Definition: $W^{k,p}_0(\Omega)$ — Zero Boundary Trace
$$ W^{k,p}_0(\Omega) \;:=\; \overline{C_c^\infty(\Omega)}^{\,\|\cdot\|_{W^{k,p}}}. $$Heuristically: functions with “zero boundary values up to order $k-1$”. On all of $\mathbb{R}^n$, $W^{k,p}_0(\mathbb{R}^n) = W^{k,p}(\mathbb{R}^n)$. On bounded $\Omega$ with reasonable boundary, the inclusion $W^{k,p}_0 \subsetneq W^{k,p}$ is strict (constants live in the latter, not the former).
Banach / Hilbert / Reflexive / Separable Properties
| Property | $W^{k,p}$, $1\le p < \infty$ | $W^{k,\infty}$ | $H^k$ |
|---|---|---|---|
| Banach | ✓ | ✓ | ✓ |
| Hilbert | only $p=2$ | ✗ | ✓ |
| Separable | ✓ | ✗ | ✓ |
| Reflexive | ✓ iff $1
| ✗ | ✓ |
| Uniformly convex | $1
| ✗ | ✓ |
Completeness proof sketch: a Cauchy sequence $\{u_m\} \subset W^{k,p}$ has $\{D^\alpha u_m\}$ Cauchy in $L^p$ for each $|\alpha|\le k$; pass to $L^p$-limits, then verify the limits satisfy the weak-derivative identity by passing the limit through the integration-by-parts identity (dominated convergence on the test function side).
Approximation by Smooth Functions
Mollifiers
$\eta \in C_c^\infty(\mathbb{R}^n)$, $\eta \ge 0$, $\int\eta = 1$, $\operatorname{supp}\eta \subset B_1(0)$. Set $\eta_\varepsilon(x) = \varepsilon^{-n}\eta(x/\varepsilon)$. Define $u_\varepsilon = \eta_\varepsilon * u$ on $\Omega_\varepsilon = \{x \in \Omega : \operatorname{dist}(x,\partial\Omega)>\varepsilon\}$. Then $u_\varepsilon \in C^\infty(\Omega_\varepsilon)$ and $D^\alpha u_\varepsilon = \eta_\varepsilon * D^\alpha u$ on $\Omega_\varepsilon$ (the weak derivative commutes with convolution).
Local Approximation
If $u \in W^{k,p}_{\text{loc}}(\Omega)$, $1\le p<\infty$, then $u_\varepsilon \to u$ in $W^{k,p}_{\text{loc}}(\Omega)$. The proof is Friedrichs’s classical mollification argument.
Theorem: Meyers–Serrin “$H = W$”
For every open $\Omega \subset \mathbb{R}^n$ and $1 \le p < \infty$, $C^\infty(\Omega) \cap W^{k,p}(\Omega)$ is dense in $W^{k,p}(\Omega)$.
The historical name: $H^{k,p}$ used to denote the completion of $C^\infty \cap W^{k,p}$ under $\|\cdot\|_{W^{k,p}}$. Meyers–Serrin (1964) proved $H^{k,p} = W^{k,p}$ for arbitrary open sets, with no boundary regularity assumed. The proof partitions $\Omega$ into shells and mollifies on each, gluing with a partition of unity.
[!note] What can fail at the boundary Density of $C^\infty(\bar\Omega)$ (note the closure) requires $\Omega$ to have, e.g., the segment property (a weak boundary regularity). Without it, $C^\infty(\Omega) \cap W^{k,p}$ may fail to be dense in $W^{k,p}$ in the $C^\infty(\bar\Omega)$ subspace sense — though Meyers–Serrin density is unaffected.
Extension Theorems
Extension Operator
A bounded linear $E\colon W^{k,p}(\Omega)\to W^{k,p}(\mathbb{R}^n)$ with $Eu|_\Omega = u$. Such an operator lets you transfer Fourier-analytic and global-$\mathbb{R}^n$ results back to $\Omega$.
Existence: $E$ exists when $\partial\Omega$ is sufficiently regular — Lipschitz suffices for $k=1$; $C^k$ suffices in general. Constructions: reflection across the boundary (half-space), then partition of unity + local charts.
[!note] Calderón–Stein extension For bounded Lipschitz $\Omega$, Stein (1970) constructed a universal extension operator $E$ that works simultaneously for all $W^{k,p}$, $k\in\mathbb{N}_0$, $1\le p\le\infty$. Without boundary regularity, extension can fail.
Traces
The trace problem: $u \in W^{1,p}(\Omega)$ is only defined a.e., so the boundary $\partial\Omega$ (measure-zero set) sees no canonical value. Trace theorems make this work via continuity.
Theorem: Trace Operator
$$ T \colon W^{1,p}(\Omega) \longrightarrow L^p(\partial\Omega), \qquad Tu = u|_{\partial\Omega} \text{ if } u\in C(\bar\Omega). $$$$ W^{1,p}_0(\Omega) \;=\; \ker T \;=\; \{u\in W^{1,p}(\Omega) : Tu = 0\}. $$Trace as Loss of $1/p$ Regularity
The “$1-1/p$” reflects a general principle: restricting to a codimension-$d$ surface costs $d/p$ derivatives. Heuristic via Fourier: restriction in $\mathbb{R}^n$ corresponds to summing over the normal frequency, which behaves like a Riemann sum against $|\xi|^{-d}$, hence the $d/p$ deficit.
[!question] Why no trace for $p=1$ on the kernel side? $W^{1,1}_0(\Omega)$ still equals $\ker T$, but $W^{1-1,1} = L^1$, and the trace operator $W^{1,1}\to L^1(\partial\Omega)$ is surjective onto $L^1(\partial\Omega)$ — the fractional space “collapses”. For $p=\infty$ traces land in $W^{1,\infty}(\partial\Omega)$ directly (it’s just Lipschitz restriction).
Sobolev Embedding Theorems
The heart of the theory. Set the Sobolev conjugate $p^* := \frac{np}{n-p}$ for $1 \le p < n$. Three regimes by $kp$ vs. $n$.
Theorem: Gagliardo–Nirenberg–Sobolev ($kp < n$)
$$ \|u\|_{L^{p^*}(\Omega)} \;\le\; C\,\|u\|_{W^{1,p}(\Omega)}. $$$$ \|u\|_{L^{p^*}(\mathbb{R}^n)} \;\le\; C(n,p)\,\|\nabla u\|_{L^p(\mathbb{R}^n)}. $$Sharp constant: Talenti–Aubin (1976). Extremizers exist only for $1
General order: $W^{k,p}(\Omega)\hookrightarrow L^q(\Omega)$ for $\frac{1}{q} = \frac{1}{p} - \frac{k}{n}$ when $kp
In particular $W^{1,p}\hookrightarrow C^{0,1-n/p}$ for $p>n$ on Lipschitz domains. This is the Morrey embedding, with explicit Hölder modulus. The naive guess $L^\infty$ fails: $W^{1,n} \not\hookrightarrow L^\infty$. Counterexample (n=2): $u(x) = \log\log(1+1/|x|)$ on the unit ball, in $W^{1,2}$ but unbounded.
This is the $p=1$ case. For general $1
At the critical exponent $q=p^*$ the embedding is continuous but not compact. For $p\ge n$ compact embeddings into $L^q$ all $q<\infty$ (resp. into $C^{0,\alpha}$ for $p>n$ and $\alpha < 1-n/p$). Compactness is the workhorse for: [!note] Loss of compactness at the critical exponent
Failure of $H^1\hookrightarrow\hookrightarrow L^{2^*}$ (with $2^*=2n/(n-2)$) is the source of all critical-exponent phenomena: Yamabe problem, prescribed scalar curvature, Brezis–Nirenberg. Bubbles $u_\varepsilon(x) = \varepsilon^{-(n-2)/2}U(x/\varepsilon)$ converge weakly but not strongly. These convert control of $\nabla u$ into control of $u$ — essential for coercivity of the Dirichlet form.
Best constant for $p=2$ is $1/\sqrt{\lambda_1}$, where $\lambda_1$ is the first Dirichlet eigenvalue of $-\Delta$ on $\Omega$. Consequence: on $W^{1,p}_0$, the seminorm $\|\nabla u\|_{L^p}$ is an equivalent norm.
Best $p=2$ constant: $1/\sqrt{\mu_1}$ where $\mu_1$ is the first Neumann eigenvalue (the second eigenvalue of the Neumann Laplacian; the first is 0 with constant eigenfunction). [!tip] When you need a Poincaré
Any time you want coercivity for $\int|\nabla u|^p$ as a norm. If you have Dirichlet BC, use the zero-trace version. If you’re working on $W^{1,p}/\mathbb{R}$ (e.g., Neumann problems), use Poincaré–Wirtinger. Two main definitions; for $p=2$ they coincide and equal the Bessel potential space.
For $s>1$, write $s = k + \sigma$ with $k\in\mathbb{N}$, $\sigma\in(0,1)$, and require $D^\alpha u \in W^{\sigma,p}$ for $|\alpha|=k$.
i.e. $\widehat{(1-\Delta)^{s/2}u}(\xi) = (1+|\xi|^2)^{s/2}\widehat{u}(\xi)$. In fact, $W^{s,p}$ (Slobodeckij) coincides with the Besov space $B^s_{p,p}$, not with $H^{s,p}=F^s_{p,2}$ — the latter is Triebel–Lizorkin. The mismatch for $p\neq 2$, $s\notin\mathbb{N}$ is fundamental. The trace operator $T\colon W^{k,p}(\Omega) \to W^{k-1/p,\,p}(\partial\Omega)$ uses precisely the Slobodeckij version because $\partial\Omega$ is a manifold without group structure where Fourier is awkward. The image is genuinely fractional unless $p=2$, where everything coincides nicely.
Characterization: $f \in H^{-1}$ iff $f = f_0 + \sum_{i=1}^n \partial_i f_i$ for some $f_0,\dots,f_n \in L^2(\Omega)$ (in distributional sense). Norm: infimum of $\bigl(\sum\|f_i\|_{L^2}^2\bigr)^{1/2}$ over such decompositions.
(Lax–Milgram applied to the bilinear form $a(u,v)=\int\nabla u\cdot\nabla v$.) For $s>0$, $W^{-s,p}(\Omega) := (W^{s,p'}_0(\Omega))^*$ where $p' = p/(p-1)$. These appear when applying differential operators of order $>k$ to $W^{k,p}$ functions, the result lying in negative-order spaces.
where $\mathcal{M}$ is the space of finite Radon measures. $W^{1,1}\subsetneq BV$: $BV$ allows the gradient to be a measure, e.g., $\mathbf{1}_E$ for $E$ a set of finite perimeter has $D\mathbf{1}_E = \nu_E \mathcal{H}^{n-1}\lfloor\partial^* E$ (the reduced boundary times the outer normal). $BV$ replaces $W^{1,1}$ in image processing (Rudin–Osher–Fatemi total variation denoising), free-boundary problems, and minimal surface theory. It is not reflexive, only weak-$*$ compact (Banach–Alaoglu on the dual of a separable space). [!note] Tangent to Angelo’s work
Sobolev structure underlies most rigorous analysis of continuous-time/space multi-agent systems and continuous strategy spaces. Mechanism-design impossibility results (Myerson–Satterthwaite, Gibbard–Satterthwaite) live in discrete/measurable settings — but their continuous analogues (e.g. matching with continuous types, Hotelling-style games) often require Sobolev-regular type distributions, and welfare functionals’ minimizers naturally live in $H^1$-type spaces over the strategy simplex. The Cooperation Gap framework’s “approximate mechanism” notion could be re-cast as a closeness condition in a Sobolev-type seminorm over the policy space.Theorem: Morrey ($kp > n$)
$$
W^{k,p}(\Omega) \hookrightarrow C^{m,\alpha}(\bar\Omega), \qquad m + \alpha = k - n/p,\; m\in\mathbb{N}_0,\; \alpha\in(0,1].
$$The Borderline Case $kp = n$
Summary Table of Embeddings
Regime
Embedding
Borderline
$kp < n$
$W^{k,p}\hookrightarrow L^q$, $\tfrac1q=\tfrac1p-\tfrac kn$
$q = p^* = np/(n-p)$ best
$kp = n$
$W^{k,p}\hookrightarrow L^q$ all $q<\infty$
Trudinger–Moser exponential
$kp > n$
$W^{k,p}\hookrightarrow C^{m,\alpha}$, $m+\alpha=k-n/p$
$\alpha=1$ requires $kp>n$ strict
Proof Sketch — GNS via the $p=1$ Identity
$$
|u(x)| \;\le\; \int_{-\infty}^{x_i} |\partial_i u(\dots,t,\dots)|\,dt.
$$$$
\|u\|_{L^{n/(n-1)}}\le \prod_{i=1}^n\|\partial_i u\|_{L^1}^{1/n} \;\le\; \|\nabla u\|_{L^1}.
$$
Compactness: Rellich–Kondrachov
Theorem: Rellich–Kondrachov
$$
W^{1,p}(\Omega) \;\hookrightarrow\hookrightarrow\; L^q(\Omega) \qquad \text{is compact for every } 1\le q < p^*.
$$Consequences for Variational Methods
Poincaré-Type Inequalities
Poincaré Inequality (Zero-Trace Version)
$$
\|u\|_{L^p(\Omega)} \;\le\; C(\Omega, p)\,\|\nabla u\|_{L^p(\Omega)}.
$$Poincaré–Wirtinger (Zero-Mean Version)
$$
\Bigl\|u - \frac{1}{|\Omega|}\!\!\int_\Omega u\Bigr\|_{L^p(\Omega)} \;\le\; C(\Omega,p)\,\|\nabla u\|_{L^p(\Omega)}.
$$
Fractional Sobolev Spaces $W^{s,p}$
Slobodeckij Seminorm (Gagliardo)
$$
[u]_{W^{s,p}(\Omega)}^p \;:=\; \int_\Omega\!\!\int_\Omega \frac{|u(x)-u(y)|^p}{|x-y|^{n+sp}}\,dx\,dy,
$$$$
\|u\|_{W^{s,p}}^p := \|u\|_{L^p}^p + [u]_{W^{s,p}}^p.
$$Bessel Potential Spaces $H^{s,p}$
$$
H^{s,p}(\mathbb{R}^n) := \{u \in \mathcal{S}'(\mathbb{R}^n) : (1-\Delta)^{s/2}u \in L^p(\mathbb{R}^n)\},
$$
Property
$W^{s,p}$ (Slobodeckij)
$H^{s,p}$ (Bessel)
Defined via
Difference quotients
Fourier multipliers
Equals $W^{s,p}$ for $p=2$
✓
✓
Equals $W^{s,p}$ for $p\neq 2$
—
only if $s\in\mathbb{N}$
Interpolation behavior
Real interp. (Besov-like)
Complex interp.
Trace Spaces Revisited
Dual Spaces and Negative-Order Sobolev Spaces
$H^{-1}(\Omega)$ — Dual of $H^1_0$
$$
H^{-1}(\Omega) := (H^1_0(\Omega))^* .
$$Negative Fractional Spaces
Bounded Variation $BV$ and the $W^{1,1}$ Subtleties
$BV(\Omega)$
$$
BV(\Omega) := \{u \in L^1(\Omega) : Du \in \mathcal{M}(\Omega;\mathbb{R}^n)\}
$$
Connections, Comparisons, Applications
Sobolev vs. Hölder vs. Lipschitz Spaces
Space
Norm strength
When useful
$W^{k,p}$
$L^p$ of $k$ derivatives
Variational PDE, weak solutions
$C^{k,\alpha}$
Hölder modulus
Regularity theory, Schauder estimates
$W^{k,\infty} = C^{k-1,1}$
Bounded $k$th derivative
Lipschitz / semiconcave regularity
$BV$
Measure-valued gradient
Free discontinuities, $L^1$-coercivity
$H^{s,p}$
Bessel/Fourier
Pseudodifferential, harmonic analysis
Why $p=2$ Is Privileged
Connection to Statistical Learning / NN Approximation
Connection to Optimal Transport & Functional Inequalities
Canonical References