Autoregressive Modelling

On Autoregressivity The main idea of autoregressivity is to use previous prediction to predict the next state. The Autoregressive property 🟩 Autoregressive models model a joint distribution of aleatoric variables by assuming a chain rule like decomposition: $$ p(x) = \prod_{i=1}^{n} p(x_i | x_{1:i-1}) $$ If we assume independence between the variables, we don’t need many variables to model it $2T$, but this assumption is too strong. If we just use a tabular approach, we’ll have a combinatorial explosion: we will have about $2^{T - 1}$ possible states (if we assume the aleatoric variables are binary, and we are creating a table for each intermediate variable). ...

2 min Â· Xuanqiang 'Angelo' Huang

Backpropagation

Backpropagation is perhaps the most important algorithm of the 21st century. It is used everywhere in machine learning and is also connected to computing marginal distributions. This is why all machine learning scientists and data scientists should understand this algorithm very well. An important observation is that this algorithm is linear: the time complexity is the same as the forward pass. Derivatives are unexpectedly cheap to calculate. This took a lot of time to discover. See colah’s blog. Karpathy has a nice resource for this topic too! ...

7 min Â· Xuanqiang 'Angelo' Huang

Compiler Limitations

On Compiler Adding compilation flags to gcc not always makes it faster, it just enables a specific set of optimization methods. It’s also good to turn on platform specific flags to turn on some specific optimization methods to that architecture. Remember that compilers are conservative, meaning they do not apply that optimization if they think it does not always apply. What are they good at Compilers are good at: mapping program to machine ▪ register allocation ▪ instruction scheduling ▪ dead code elimination ▪ eliminating minor inefficiencies ...

3 min Â· Xuanqiang 'Angelo' Huang

Datacenter Hardware

We want to optimize the parts of the datacenter hardware such that the cost of operating the datacenter as a whole would be lower, we need to think about it as a whole. Datacenter CPUs Different requirements Hardware needs high level isolation (because it will be shared among different users). Usually high workloads and moving a lot of data around. They have a spectrum of low and high end cores, so that if you have high parallelism you can use lower cores, while for resource intensive tasks, its better to have high end cores, especially for latency critical tasks. ...

11 min Â· Xuanqiang 'Angelo' Huang

Dependency Parsing

This set of note is still in TODO Dependency Grammar has been much bigger in Europe compared to USA, where Chomsky’s grammars ruled. One of the main developers of this theory is Lucien Tesnière (1959): “The sentence is an organized whole, the constituent elements of which are words. Every word that belongs to a sentence ceases by itself to be isolated as in the dictionary. Between the word and its neighbors, the mind perceives connections, the totality of which forms the structure of the sentence. The structural connections establish dependency relations between the words. Each connection in principle unites a superior term and an inferior term. The superior term receives the name governor (head). The inferior term receives the name subordinate (dependent).” ~Lucien Tesnière ...

4 min Â· Xuanqiang 'Angelo' Huang

Devices OS

Devices Categorizzazione (6)🟨- Trasferimento dei dati Accesso al device sinfonia del trasferimento condivisone fra processi Velocità del trasferimento I/O direction (scrittura o lettura) Vediamo che molte caratteristiche sono riguardo il trasferimento Slide categorizzazione I/O Blocchi o caratteri 🟩- Slide devices blocchi o caratteri Tecniche di gestione devices (4) 🟨- Buffering Possiamo mettere un buffer per favorire la comunicazione fra i devices. la cosa migliore che fa è creare maggiore efficienza. Un altro motivo è la velocità diversa di consumo. ...

4 min Â· Xuanqiang 'Angelo' Huang

Fast Linear Algebra

Many problems in scientific computing include: Solving linear equations Eigenvalue computations Singular value decomposition LU/Cholesky/QR decompositions etc… And the userbase is quite large for this types of computation (number of scientists in the world is growing exponentially ) Quick History of Performance Computing Early seventies it was EISPACK and LINPACK. Then In similar years Matlab was invented, which simplified a lot compared to previous systems. LAPACK redesigned the algorithms in previous libraries to have better block-based locality. BLAS are kernel functions for each computer, while LAPACK are the higher level functions build on top of BLAS (1, 2,3). Then another innovation was ATLAS, which automatically generates the code for BLAS for each architecture. This is called autotuning because it does a search of possible enumerations and chooses the fastest one. Now autotuning has been done a lot for NN systems. ...

5 min Â· Xuanqiang 'Angelo' Huang

Generative Adversarial Networks

Generative Adversarial Network has been introduced in 2014 by Ian Goodfellow (at that time they where still gray and white). Now the images have been improved so much with Diffusion Models. This idea has been considered by Yann LeCun as one of the most important ideas. Nowadays (2025) they are still used for super-resolution and other applications, but it has still some limitations (mainly stability), and now has good competition against other models. The resolution purported by GAN is much higher than VAE (see Autoencoders#Variational Autoencoders). This is a easy plugin to improve the results of other models (VAE, flow, Diffusion). Also ChatGPT has some sort of adversarial learning for example, not explained in the same manner as here. ...

8 min Â· Xuanqiang 'Angelo' Huang

HTTP e REST

HTTP is the acronym for HyperText Transfer Protocol. Caratteristiche principali (3) Comunicazioni fra client e server, e quanto sono comunicate le cose si chiude la connessione e ci sono politiche di caching molto bone (tipo con i proxy) Generico: perché è un protocollo utilizzato per caricare moltissime tipologie di risorse! Stateless, ossia non vengono mantenute informazioni su scambi vecchi, in un certo modo ne abbiamo parlato in Sicurezza delle reti quando abbiamo parlato di firewall stateless. Solitamente possiamo intendere questo protocollo come utile per scambiare risorse di cui abbiamo parlato in Uniform Resource Identifier. ...

6 min Â· Xuanqiang 'Angelo' Huang

Language Models

In order to understand language models we need to understand structured prediction. If you are familiar with Sentiment Analysis, where given an input text we need to classify it in a binary manner, in this case the output space usually scales in an exponential manner. The output has some structure, for example it could be a tree, it could be a set of words etc… This usually needs an intersection between statistics and computer science. ...

2 min Â· Xuanqiang 'Angelo' Huang