Here is a historical treatment on the topic: https://jwmi.github.io/ASM/6-KalmanFilter.pdf. Kalman Filters are defined as follows:

We start with a variable $X_{0} \sim \mathcal{N}(\mu, \Sigma)$, then we have a motion model and a sensor model:

$$ \begin{cases} X_{t + 1} = FX_{t} + \varepsilon_{t} & F \in \mathbb{R}^{d\times d}, \varepsilon_{t} \sim \mathcal{N}(0, \Sigma_{x})\\ Y_{t} = HX_{t} + \eta_{t} & H \in \mathbb{R}^{m \times d}, \eta_{t} \sim \mathcal{N}(0, \Sigma_{y}) \end{cases} $$

Inference is just doing things with the Gaussians. One can interpret the $Y$ to be the observations and $X$ to be the underlying beliefs about a certain state. We see that the Kalman Filters satisfy the Markov Property, see Markov Chains. These independence properties allow a easy characterization of the joint distribution for Kalman Filters:

$$ p(x_{1:t}, y_{1:t}) = p(x_{1}) p(y_{1} \mid x_{1}) \prod_{i = 2}^{t} p(x_{i} \mid x_{i - 1}) p(y_{i} \mid x_{i}) $$

One can also observe that the classic Bayesian Linear Regression is a special case of Kalman Filters.

Kalman Inference

Kalman Gain

In the exercise session we have discussed about the Kalman gain

$$ K_{t+1} = \frac{\sigma^{2}_{x} + \sigma_{t}^{2}}{\sigma^{2}_{x} + \sigma^{2}_{y} + \sigma_{t}^{2}} $$$$ \begin{align} \\ \mu_{t + 1} &= \mu_{t} + K_{t + 1}(y_{t + 1} - \mu_{t}) \\ \sigma^{2}_{t + 1} &= (1 - K_{t+1})(\sigma^{2}_{t} + \sigma^{2}_{x}) \end{align} $$

The updates are somewhat similar to RL reinforcement methods studied in RL Function Approximation, they are a convex update with the new value and the old value.

So this value tells us how likely are we to trust the new observation, if the value is high (close to one) the new observation has usually a high impact, else it is lower.

Interpretation of the Kalman Gain

We also notice that if the uncertainty about the observation is high, then the Kalman Gain is much lower, and closer to 0, this is because probably we cannot trust the update much. If else the uncertainty about the hidden status update is high, then the Kalman Gain is a little bit higher, and closer to 1, this is because we trust more the hidden status than the observation.

Proof Sketch

We will prove the case where $F, H = I$.

It goes as follow, first assume recursively that the value of $X_{t} \mid y_{1:t} \sim \mathcal{N}(\mu_{t}, \sigma_{t}^{2})$

$$ p(x_{t + 1} \mid y_{1:t+1}) = \frac{p(y_{t + 1} \mid x_{t + 1}) p(x_{t + 1} \mid y_{1:t})}{Z} $$$$ p(y_{t +1}\mid x_{t + 1}) = \frac{1}{Z'} \exp\left( -\frac{1}{2} \frac{(y_{t + 1} - x_{t + 1})^{2}}{\sigma^{2}_{y}} \right) $$$$ p(x_{t + 1} \mid y_{1:t}) = \int p(x_{t + 1} \mid x_{t}) p(x_{t} \mid y_{1:t}) dx_{t} \approx \mathcal{N}(\mu_{t}, \sigma^{2}_{t} + \sigma^{2}_{x}) $$

Then putting everything together one can have the solution above.

General Case

In the general case the Kalman Update is as follows:

$$ \begin{cases} \mu_{t + 1} = F\mu_{t} + K_{t + 1}(y_{t + 1} - HF\mu_{t}) \\ \sigma^{2}_{t + 1} = (I - K_{t + 1}H)(F\Sigma_{t}F^{T} + \Sigma_{x}) \end{cases} $$