Notes

Fast Linear Algebra

Many problems in scientific computing include: Solving linear equations Eigenvalue computations Singular value decomposition LU/Cholesky/QR decompositions etc… And the userbase is quite large for this types of computation (number of scientists in the world is growing exponentially ) Quick History of Performance Computing Early seventies it was EISPACK and LINPACK. Then In similar years Matlab was invented, which simplified a lot compared to previous systems. LAPACK redesigned the algorithms in previous libraries to have better block-based locality. BLAS are kernel functions for each computer, while LAPACK are the higher level functions build on top of BLAS (1, 2,3). Then another innovation was ATLAS, which automatically generates the code for BLAS for each architecture. This is called autotuning because it does a search of possible enumerations and chooses the fastest one. Now autotuning has been done a lot for NN systems. ...

Clustering

Gaussian Mixture Models This set takes inspiration from chapter 9.2 of (Bishop 2006). We assume that the reader already knows quite well what is a Gaussian Mixture Model and we will just restate the models here. We will discuss the problem of estimating the best possible parameters (so, this is a density estimation problem) when the data is generated by a mixture of Gaussians. $$ \mathcal{N}(x \mid \mu, \Sigma) = \frac{1}{\sqrt{ 2\pi }} \frac{1}{\lvert \Sigma \rvert^{1/2} } \exp \left( -\frac{1}{2} (x - \mu)^{T} \Sigma^{-1}(x - \mu) \right) $$Problem statement $$ p(z) = \prod_{i = 1}^{k} \pi_{i}^{z_{i}} $$ Because we know that $z$ is a $k$ dimensional vector that has a single digit indicating which Gaussian was chosen. ...

Markov Chains

Introduzione alle catene di Markov La proprietà di Markov Una sequenza di variabili aleatorie $X_{1}, X_{2}, X_{3}, \dots$ gode della proprietà di Markov se vale: $$ P(X_{n}| X_{n - 1}, X_{n - 2}, \dots, X_{1}) = P(X_{n}|X_{n-1}) $$ Ossia posso scordarmi tutta la storia precedente, mi interessa solamente lo stato precedente per sapere la probabilità attuale. Da un punto di vista filosofico/fisico, ha senso perché mi sta dicendo che posso predire lo stato successivo se ho una conoscenza (completa, (lo dico io completo, originariamente non esiste)) del presente. ...

Redundant Array of Independent Disks

Introduzione ai Redundant Array of Indipendent Disks I RAID ne abbiamo citato per la prima volta in Memoria. Come facciamo a stare su alla velocità del processore se questa va a crescere in modo esponenziale? Parallelizzazione della ricerca!. Ecco perché ci serve raid (oltre alla ridondanza quindi più sicuro). E possono anche fallire. → ammette recovery. E una altra cosa bella dei raid è che sono hot-swappable cioè li puoi sostituire anche quando stanno runnando. ...

Content Delivery Networks

CDNs are intermediary servers that replicate read intensive data to provide better performance when user requests them. A close relative of CDNs is edge computing (e.g. gaming stations) where lots of computation is done directly close to the user. Types of CDNs Mainly three types of CDNs: Highly distributed ones. -> Akamai Database based ones. Ad-hoc CDNs. Advantages and disadvantages The main reason we use CDNs is to lower the value of latency: we are in fact bringing the data closer to the user. We have much less data in length to be transmitted. Yet we have some disadvantages too: ...

HTTP e REST

HTTP is the acronym for HyperText Transfer Protocol. Caratteristiche principali (3) Comunicazioni fra client e server, e quanto sono comunicate le cose si chiude la connessione e ci sono politiche di caching molto bone (tipo con i proxy) Generico: perché è un protocollo utilizzato per caricare moltissime tipologie di risorse! Stateless, ossia non vengono mantenute informazioni su scambi vecchi, in un certo modo ne abbiamo parlato in Sicurezza delle reti quando abbiamo parlato di firewall stateless. Solitamente possiamo intendere questo protocollo come utile per scambiare risorse di cui abbiamo parlato in Uniform Resource Identifier. ...

Optimizations for DNN

Mixture of Experts There is a gate that opens a subset of the experts, and the output is the weighted sum of the outputs of the experts. The weights are computed by a gating network. One problem is load balancing, non uniform assignment. And there is a lot of communication overhead when you place them in different devices. LoRA: Low-Rank Adaptation We only finetune a part of the network, called lora adapters, not the whole thing. There are two matrices here, a matrix A and B, they are some sort of an Autoencoders, done for every Q nd V matrices in the LLM attention layer. The nice thing is that there are not many inference costs if adapters are merged post training: ...

Virtual Machines

The fundamental idea behind a virtual machine is to abstract the hardware of a single computer (the CPU, memory, disk drives, network interface cards, and so forth) into several different execution environments, thereby creating the illusion that each separate environment is running on its own private computer. (Silberschatz et al. 2018). Virtualization allows a single computer to host multiple virtual machines, each potentially running a completely different operating system. È virtuale nel senso che la macchina virtuale ha la stessa percezione della realtà di una macchina reale. Qualcosa che non è la realtà ma appare molto simile ad essa. ...

Communication in the Cloud

How can we coordinate services to actually understand what they are doing, or what the user wants them to do? How to manage networks errors? This note will mainly focus on high level communication protocols to coordinate this kind of communication. Remote Procedure Calls History: the Stub This has been the main idea, introduced in 1984, using the idea of stubs, see (Birrell & Nelson 1984). The system basically calls the remote procedure as if it was local on the high level, but on a lower level a network request is sent. The architecture has remained the same in these years. It hides all the complexity in the stub (marshaling, binding and sending, without caring about the sockets and communication matters). One problem is that it might be hiding the complexity too well. The programmer has surely an ease of programming, but design consideration should consider overloads generated by the network communication. ...

Compute Express Link

This allows us to extend the memory hierarchy (see Memoria) that we have today. The problem is that we have heterogeneous access patterns specifications and hardware. One of the main trends is disaggregation: we want to be able to scale different resources independently. Introduction to CXL (Compute Express Link) This is a new part of the memory hierarchy. NVM is a kind of non volatile memory that is used as a storage device that is close to the device (others are network attached or slower than network part anyways. It is persistent and has low latency. It is used in the memory hierarchy to extend the memory capacity. ...