Here is a historical treatment on the topic: https://jwmi.github.io/ASM/6-KalmanFilter.pdf. Kalman Filters are defined as follows:
We start with a variable $X_{0} \sim \mathcal{N}(\mu, \Sigma)$, then we have a motion model and a sensor model:
$$ \begin{cases} X_{t + 1} = FX_{t} + \varepsilon_{t} & F \in \mathbb{R}^{d\times d}, \varepsilon_{t} \sim \mathcal{N}(0, \Sigma_{x})\\ Y_{t} = HX_{t} + \eta_{t} & H \in \mathbb{R}^{m \times d}, \eta_{t} \sim \mathcal{N}(0, \Sigma_{y}) \end{cases} $$Inference is just doing things with the Gaussians. One can interpret the $Y$ to be the observations and $X$ to be the underlying beliefs about a certain state. We see that the Kalman Filters satisfy the Markov Property, see Markov Chains. These independence properties allow a easy characterization of the joint distribution for Kalman Filters:
$$ p(x_{1:t}, y_{1:t}) = p(x_{1}) p(y_{1} \mid x_{1}) \prod_{i = 2}^{t} p(x_{i} \mid x_{i - 1}) p(y_{i} \mid x_{i}) $$One can also observe that the classic Bayesian Linear Regression is a special case of Kalman Filters.
Kalman Inference
In the exercise session we have discussed about the Kalman gain
That is
Given the belief that $X_{t} \mid y_{1:t} = \mathcal{N}(\mu_{t}, \sigma^{2}_{t})$ Then the posterior $X_{t+1} \mid y_{1:t+1} =\sim \mathcal{N}(\mu_{t+1}, \sigma^{2}_{t + 1})$ where
$$ \begin{align} \\ \mu_{t + 1} &= \mu_{t} + K_{t + 1}(y_{t + 1} - \mu_{t}) \\ \sigma^{2}_{t + 1} &= (1 - K_{t+1})(\sigma^{2}_{t} + \sigma^{2}_{x}) \end{align} $$Where $K_{t+1} = (\sigma^{2}_{x} + \sigma^{2}_{t}) / (\sigma^{2}_{x} + \sigma^{2}_{t} + \sigma^{2}_{y})$ So this value tells us how likely are we to trust the new observation, if the value is high (close to one) the new observation has usually a high impact, else it is lower.