Autoregressive Modelling

On Autoregressivity The main idea of autoregressivity is to use previous prediction to predict the next state. The Autoregressive property 馃煩 Autoregressive models model a joint distribution of aleatoric variables by assuming a chain rule like decomposition: $$ p(x) = \prod_{i=1}^{n} p(x_i | x_{1:i-1}) $$ If we assume independence between the variables, we don鈥檛 need many variables to model it $2T$, but this assumption is too strong. If we just use a tabular approach, we鈥檒l have a combinatorial explosion: we will have about $2^{T - 1}$ possible states (if we assume the aleatoric variables are binary, and we are creating a table for each intermediate variable). ...

2 min 路 Xuanqiang 'Angelo' Huang

Backpropagation

Backpropagation is perhaps the most important algorithm of the 21st century. It is used everywhere in machine learning and is also connected to computing marginal distributions. This is why all machine learning scientists and data scientists should understand this algorithm very well. An important observation is that this algorithm is linear: the time complexity is the same as the forward pass. Derivatives are unexpectedly cheap to calculate. This took a lot of time to discover. See colah鈥檚 blog. Karpathy has a nice resource for this topic too! ...

7 min 路 Xuanqiang 'Angelo' Huang

Generative Adversarial Networks

Generative Adversarial Network has been introduced in 2014 by Ian Goodfellow (at that time they where still gray and white). Now the images have been improved so much with Diffusion Models. This idea has been considered by Yann LeCun as one of the most important ideas. Nowadays (2025) they are still used for super-resolution and other applications, but it has still some limitations (mainly stability), and now has good competition against other models. The resolution purported by GAN is much higher than VAE (see Autoencoders#Variational Autoencoders). This is a easy plugin to improve the results of other models (VAE, flow, Diffusion). Also ChatGPT has some sort of adversarial learning for example, not explained in the same manner as here. ...

8 min 路 Xuanqiang 'Angelo' Huang

Recurrent Neural Networks

Recurrent Neural Networks allows us to model arbitrarily long sequence dependencies, at least in theory. This is very handy, and has many interesting theoretical implication. But here we are also interested in the practical applicability, so we may need to analyze common architectures used to implement these models, the main limitation and drawbacks, the nice properties and some applications. These network can bee seen as chaotic systems (non-linear dynamical systems), see Introduction to Chaos Theory. ...

4 min 路 Xuanqiang 'Angelo' Huang

Normalizing Flows

Normalizing flows have both latent space and can produce tractable explicit probability distributions (closer to Autoregressive Modelling, they have tractable distributions, but not a latent space). This means we are able to get the likelihoods of a certain sample. This approach to modelling a flexible distribution is called a normalizing flow because the transformation of a probability distribution through a sequence of mappings is somewhat analogous to the flow of a fluid. From (Bishop & Bishop 2024) ...

5 min 路 Xuanqiang 'Angelo' Huang

Convolutional Neural Network

Introduction to Convolutional NN Design Goals We want to be invariant to some transformations but also at the same time to be specific to some thing The convolution operator 馃煩- $$ \sum_{i} \sum_{j} h(x - i, y - j) f(i, j) $$Il prodotto di convoluzione 猫 matematicamente molto contorto, anche se nella pratica 猫 una cosa molto molto semplice. In pratica voglio calcolare il valore di un pixel in funzione di certi suoi vicini, moltiplicati per un filter che in pratica 猫 una matrice di pesi, che definisce un pattern lineare a cui sarei interessato di cercare nell鈥檌mmagine. ...

8 min 路 Xuanqiang 'Angelo' Huang

The Perceptron Model

The perceptron is a fundamental binary linear classifier introduced by (Rosenblatt 1958). It maps an input vector $\mathbf{x} \in \mathbb{R}^n$ to an output $y \in \{0,1\}$ using a weighted sum followed by a threshold function. The Mathematical Definition Given an input vector $\mathbf{x} = (x_1, x_2, \dots, x_n)$ and a weight vector $\mathbf{w} = (w_1, w_2, \dots, w_n)$, the perceptron computes: $$ z = \mathbf{w}^\top \mathbf{x} + b = \sum_{i=1}^{n} w_i x_i + b $$where $b$ is the bias term. The output is determined by the Heaviside step function: ...

3 min 路 Xuanqiang 'Angelo' Huang