Diffusion models random motion. It was first analyzed by Brown.
The Diffusion Process
This note follows original Einstein’s presentation, here we have a simplifyed version.
Let’s suppose we have a particle at $t = 0$ at some position $i$. We have a probability of jumping to the left of $p$ to right of $q$, the rest is staying at the same position.
Concentration
We would like to know the concentration of particles after a number of fixed steps at a certain position. Then we would like to know the same thing if we extend the idea to a certain number of starting particles at the beginning. Let’s call this concentration $C_{i}(t)$. Then the number of particles at a certain time step and position is $n_{i}(t) = Nc_{i}(t)$
Markov Process
We consider we have the above concentration process, we would like to determine the evolution process. Then we have the following relation:
$$ C_{i}(t + 1) = C_{i - 1}(t)q + C_{i + 1}(t)p + (1 - q - p) C_{i}(t) $$This defines a Markov process, see Markov Processes, where the value of a certain timestamp depends on the previous ones. We can also interpret this as a certain recurrence relation. We observe that the difference of
$$ C_{i}(t + 1) - C_{i}(t) = \frac{p - q}{2} (C_{i + 1}(t) - C_{i - 1}(t)) + \frac{p + q}{2}(C_{i + 1}(t) - 2C_{i}(t) + C_{i - 1}(t)) $$From the Markov Process we can have the master equation, from which we take the Fokker Plank equation. Somehow, we can interpret the temporal difference as the sum of a first derivative and of a second derivative. The first derivative tells us a preferred direction of diffusion (drift,) the second tells us how fast it is (diffusion). No idea how.
Going into continuous time
Let’s assume we have an update time of $\Delta t$ (so smaller time deltas imply more frequent updates). We say $\tau$ is the time scale of the system (measure of time in the system), we define $\tau$ such that $\tau / \Delta t$ is the number of the updates. We can now redefine the updated probabilities with this notion of time. $p'= p \Delta t / \tau$ and equivalently $q'$. Then the continuous Markov process is
$$ C_{i}(t + \Delta t) - C_{i}(t) = \frac{p' - q'}{2} (C_{i + 1}(t) - C_{i - 1}(t)) + \frac{p' + q'}{2}(C_{i + 1}(t) - 2C_{i}(t) + C_{i - 1}(t)) $$If we consider the limit $\Delta t \to 0$ then we have that
$$ \tau \frac{d}{dt} C_{i}(t) = \frac{p - q}{2} (C_{i + 1}(t) - C_{i - 1}(t)) + \frac{p + q}{2}(C_{i + 1}(t) - 2C_{i}(t) + C_{i - 1}(t)) $$This is the master equation. Having a continuous time just changes the probability of jumping, this is the relation that allows us to have continuous updates (so if we don’t have a full time, we have just a fraction of the probability of jumping). This should be the master equation for a diffusion process.
Continuous Space
We do another re-scaling of the probability $p´ = p (\Delta x / \delta )^{2}$ then rewriting everything we get the continuous space equation. We use the square because the mean squared displacement (variance) grows linearly with time. Then we get
$$ \frac{ \partial }{ \partial t } C(x, t) = D \frac{ \partial^{2} }{ \partial x^{2} } C(x, t) $$Where $D$ is our diffusion coefficient. We do a Fourier transform in space and we get an ordinary differential equation in time, which is solvable. Take a look for the next section.
Solution for the diffusion process
One can see that the following equation is a solution for the above problem:
$$ C(x, t) = \frac{1}{\sqrt{ \pi 4 D t }} e^{-x^{2} / 4Dt} $$One can solve this just by brute force, or one fancier method is passing into the fourier space, where the equation is simpler to treat. In fact, the Fourier transform of the function $f(x, t)$ is
$$ F(k, t) = \int e^{-ikx}f(x, t) \, dx $$And it’s inverse is
$$ f(x, t) = \int e^{ikx}F(k, t) \, dk $$If we apply the Fourier Transform we are getting the following:
$$ \begin{align} \frac{d}{dt} \int e^{-ikx}C(x, t) \, dx = D \int e^{-ikx} \frac{ \partial^{2} }{ \partial x^{2} } C(x, t) \, dx \\ \frac{d}{dt} F(k, t) = -Dk^{2}F(k, t) \end{align} $$Where the right hand side is derived through a double integration by parts.
Introduction to Diffusion Models
The Forward Encoder
- What is the link to Autoencoders?
Noise Schedule
- Variance of the input.
- Why do we use such noise schedule? What is the underlying property that we will have?
Diffusion Kernel
- Write the close form
- Why is this usually useful?
Closed reverse conditional distribution
- Derive the closed form of the reverse conditional distribution