Cloud Storage

Object Stores Characteristics of Cloud Systems Object storage design principles 🟨++ We don’t want the hierarchy that is common in Filesystems, so we need to simplify that and have these four principles: Black-box objects Flat and global key-value model (trivial model, easy to access, without the need to trasverse a file hierarchy). Flexible metadata Commodity hardware (the battery idea of Tesla until 2017). Object storage usages 🟩 Object storage are useful to store things that are usually read-intensive. Some examples are ...

18 min Â· Xuanqiang 'Angelo' Huang

Content Delivery Networks

CDNs are intermediary servers that replicate read intensive data to provide better performance when user requests them. A close relative of CDNs is edge computing (e.g. gaming stations) where lots of computation is done directly close to the user. Types of CDNs Mainly three types of CDNs: Highly distributed ones. Database based ones. Ad-hoc CDNs. Advantages and disadvantages The main reason we use CDNs is to lower the value of latency: we are in fact bringing the data closer to the user. We have much less data in length to be transmitted. Yet we have some disadvantages too: ...

2 min Â· Xuanqiang 'Angelo' Huang

Fast Linear Algebra

Many problems in scientific computing include: Solving linear equations Eigenvalue computations Singular value decomposition LU/Cholesky/QR decompositions etc… And the userbase is quite large for this types of computation (number of scientists in the world is growing exponentially ) Quick History of Performance Computing Early seventies it was EISPACK and LINPACK. Then In similar years Matlab was invented, which simplified a lot compared to previous systems. LAPACK redesigned the algorithms in previous libraries to have better block-based locality. BLAS are kernel functions for each computer, while LAPACK are the higher level functions build on top of BLAS (1, 2,3). Then another innovation was ATLAS, which automatically generates the code for BLAS for each architecture. This is called autotuning because it does a search of possible enumerations and chooses the fastest one. Now autotuning has been done a lot for NN systems. ...

5 min Â· Xuanqiang 'Angelo' Huang

Livello applicazione e socket

Livello trasporto Protocolli classici Introduzione a TCP e UPD Il quarto livello dei protocolli dell’architettura di Internet è il livello trasporto (transport), ed è basato su due protocolli in particolare: il Transmission Control Protocol (TCP) e lo User Data Protocol (UDP), che possono essere usati in alternativa tra loro. Questo è nel genere di *connession oriented e non, il primo, TCP è connection oriented, l’altro no, questa è l’unica differenza fra i due. Questa differenza è spiegata in maggior dettaglio qui 0.3.8 Servizi orientati alla connessione e non 🟨+ ...

13 min Â· Xuanqiang 'Angelo' Huang

Memoria virtuale

Memoria virtuale Perché è utile la MV? 🟨- I programmi non usano tutta la memoria, ma pensano di averla tutta disponibile dal suo punto di vista. L’idea principale è che molte zone di memoria sono inutili per lungo tempo, possono essere utilizzati per altro. caricamento codice dinamico Per esempio anche a caricare il codice di un compilatore è diviso in fasi, se andiamo a caricare tutto, stiamo utilizzando solo un pezzo piccolo, tanta inefficienza, se una pagina contiene una parte del compilatore potrei caricare in memoria solamente le parti eseguite sul momento, giusto per fare un esempio diciamo. Crescita dei segmenti stack, heap, ad esempio ci permette di far crescere come ci pare la stack, e anche caricare solamente le parti della stack che ci servono, e mantenere la memoria libera per altro. Gestione degli errori. che utilizzerà i dati solamente della parte di gestione di memoria attuale diciamo. Paginazione a richiesta 🟩— Questo è un aspetto della cache delle pagine di cui abbiamo già parlato in Livello OS. ...

9 min Â· Xuanqiang 'Angelo' Huang

Paginazione e segmentazione

Memoria sistema Operativo Guardare Memoria virtuale Per vedere come vengono rimpiazzate le pagine In quest sezione andiamo a parlare di come fanno molti processi a venire eseguiti insieme, anche se lo spazio di memoria fisico è lo stesso. Andiamo quindi a parlare di spazio di indirizzi, risoluzione di questi indirizzi logici, segmentazione e paginazione. (e molto di più!) MMU Controlla se l’accesso di memoria è bono o meno. (traduzione fra indirizzo logico e fisico) ...

8 min Â· Xuanqiang 'Angelo' Huang

Sparse Matrix Vector Multiplication

Algorithms for Sparse Matrix-Vector Multiplication Compressed Sparse Row 🟨– This is an optimized way to store rows for sparse matrices: Sparse MVM using CSR void smvm(int m, const double* values, const int* col_idx, const int* row_start, double* x, double* y) { int i, j; double d; /* Loop over m rows */ for (i = 0; i < m; i++) { d = y[i]; /* Scalar replacement since reused */ /* Loop over non-zero elements in row i */ for (j = row_start[i]; j < row_start[i + 1]; j++) { d += values[j] * x[col_idx[j]]; } y[i] = d; } } Let’s analyze this code: Spatial locality: with respect to row_start, col_idx and values we have spatial locality. Temporal locality: with respect to y we have temporal locality. (Poor temporal with respect to $x$) Good storage efficiency for the sparse matrix. But it is 2x slower than the dense matrix multiplication when the matrix is dense. Block CSR But we cannot do block optimizations for the cache with this storage method. ...

3 min Â· Xuanqiang 'Angelo' Huang

Conditioning Theory

Associative Conditioning Classical Conditioning Pavlov’s experiment He was interested in digestive systems of dogs. Then he notices that if we show food to dog, they start to salivate. If paired with sound (tuning fork) they start to salivate even if they just hear the sound. He defines two states: Before conditioning During conditioning After conditioning state. Important words are conditioned stimulus, conditioned response. And their oppose (unconditioned). It is important that it is quite consistent. Associate unconditioned stimulus with conditioned stimulus. ...

3 min Â· Xuanqiang 'Angelo' Huang

Cache Optimization

Locality principles Remember the two locality principles in Memoria. Temporal locality and spatial locality. Temporal Locality Some elements just are accessed many times in time. This is an example of a temporal locality. Spatial locality Some elements are accessed close to each other, this is an idea of spatial locality. In modern architectures, the a line of cache is usually 64 bytes. For example consider this snippet: sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Sum is an example of temporal locality as the same memory location (or register) is accessed many times, and the access of the array a is an example of spatial locality. loops cycle through the same instructions, this is an example of temporal locality. ...

2 min Â· Xuanqiang 'Angelo' Huang

Compiler Limitations

On Compiler Adding compilation flags to gcc not always makes it faster, it just enables a specific set of optimization methods. It’s also good to turn on platform specific flags to turn on some specific optimization methods to that architecture. Remember that compilers are conservative, meaning they do not apply that optimization if they think it does not always apply. What are they good at Compilers are good at: mapping program to machine ▪ register allocation ▪ instruction scheduling ▪ dead code elimination ▪ eliminating minor inefficiencies ...

2 min Â· Xuanqiang 'Angelo' Huang