Notes

Softmax Function

Softmax is one of the most important functions for neural networks. It also has some interesting properties that we list here. This function is part of The Exponential Family, one can also see that the sigmoid function is a particular case of this softmax, just two variables. Sometimes this could be seen as a relaxation of the action potential inspired by neuroscience (See The Neuron for a little bit more about neurons). This is because we need differentiable, for gradient descent. The action potential is an all or nothing thing. ...

Structural Causal Models

Indipendence of Cause and Mechanism The cause and mechanisms shall be separated. One direction is the cause, the other is the mechanism that actuates the effect of the cause. Take for example the altitude and temperature example in (Peters et al. 2017) chapter 2. The distribution of the cause p(a)p(a)p(a) and the mechanism p(t∣a)p(t|a)p(t∣a) that maps it to the effect are independent. Structural Causal Models We say that given two statistical processes $C, E$, where $C$ is the cause and E the effect, then the ...

Data Plane

Introduzione Data or Control plane Come fanno i router a fare forwarding dei pacchetti? Come fanno a decidere come e dove mandare? Sono le tabelle di instradamento che decidono il prossimo hop del pacchetto. Si può dire di end-to-end perché solamente il sender e receiver andranno a livello applicazione, e leggeranno le cose (se criptato veramente solo loro riescono a fare questo). Funzioni principali Il data plane è la parte che si occupa di fare il forwarding, ossia risponde a domande come “come faccio a mandare in modo efficiente questo pacchetto lì?” mentre il control plane si occupa di fare il routing, ossia risponde a domande “Dove mando il pacchetto che ho?”. ...

Semantica di un linguaggio

Vincoli sintattici contestuali Intro: dipendenze da contesto I vincoli sintattici non sono esprimibili tramite BNF perché dipendono dal contesto, mentre le grammatiche libere sono per definizione libere da contesto, vogliamo quindi trovare una soluzione a questo problema. Vengono usati metodi Ad-Hoc nella fase di analisi semantica del programma. Grammatiche dipendenti dal contesto Queste grammatiche sono molto più complicate (e lente) rispetto a quelle libere da contesto, quindi è poco pratico e non utilizzabile (tempo esponenziale, quindi non finisce mai). ...

Algoritmi di ordinamento

Introduzione L’importanza del topic Gli algoritmi di ordinamento sono molto di base per la comprensione dell’ampio raggio degli algoritmi. Utilizzano l’analisi, introducono tecniche di risoluzione dei problemi computazionali come greedy, divide et impera e simile. Permettono un primo uso di astrazioni e l’analisi di sottoproblemi. Il problema Il problema è trovare una permutazione di un insieme di numeri iniziali tale per cui tale insieme di numeri si ordinato: Questo si può fare con qualunque collezione confrontabile fra di loro. ...

Sparse Matrix Vector Multiplication

Algorithms for Sparse Matrix-Vector Multiplication Compressed Sparse Row This is an optimized way to store rows for sparse matrices: Sparse MVM using CSR void smvm(int m, const double* values, const int* col_idx, const int* row_start, double* x, double* y) { int i, j; double d; /* Loop over m rows */ for (i = 0; i < m; i++) { d = y[i]; /* Scalar replacement since reused */ /* Loop over non-zero elements in row i */ for (j = row_start[i]; j < row_start[i + 1]; j++) { d += values[j] * x[col_idx[j]]; } y[i] = d; } } Let’s analyze this code: Spatial locality: with respect to row_start, col_idx and values we have spatial locality. Temporal locality: with respect to y we have temporal locality. (Poor temporal with respect to $x$) Good storage efficiency for the sparse matrix. But it is 2x slower than the dense matrix multiplication when the matrix is dense. Block CSR But we cannot do block optimizations for the cache with this storage method. ...

Condensatori nel vuoto

Introduzione ai condensatori Analisi introduttiva condensatori: tubi di flusso Consideriamo un **tubo di flusso infinitesimo** come in immagine. abbiamo che $dQ$ è la carica totale dentro al cubo. Tale che segua le linee di campo. Il flusso totale sarebbe $$ \oint_{\Sigma} \vec{E} \cdot d\vec{s} = \frac{Q_{T}}{\varepsilon_{0}} $$ Sappiamo anche che $$ \vec{E}_{1}d\vec{s}_{1} + \vec{E}_{2}d\vec{s}_{2} = \frac{dQ_{T}}{\varepsilon_{0}} $$ Ma scegliamo il cubo di flusso in modo che le superfici siano **perpendicolari al nostro campo**, e così posso considerare il problema da un puro punto di vista **scalare**. Sapendo che nell'esempio sott il campo non è esistente, allora posso scrivere il campo elettrico che va fuori, semplicemente in punto di vista scalare: $$ E_{2} = \frac{dQ}{\varepsilon_{0}ds_{2}} $$ esChe è molto molto simile alla forma $\frac{\sigma}{\varepsilon_{0}}$. il parametro di nostro interesse in questo esempio (almeno la cosa di nostro interesse) è *il concetto di distanza*, se ci allontaniamo dalla nostra superficie, $dS_{2}$ diventa più larga Introduzione ai condensatori Poniamo di avere due armature metalliche qualsiasi, che abbiamo cariche uguali ed opposte in segno di una forma qualunque a distanza qualunque, in questo setting teorico. La cosa interessante è che suppongo di avere #Induzione completa in questo caso. È una necessità per l’analisi dei condensatori. ...

Reinforcement Learning, a introduction

The main difference between reinforcement learning and other machine learning, pattern inference methods is that reinforcement learning takes the concept of actions into its core: models developed in this field can be actively developed to have an effect in its environment, while other methods are mainly used to summarize interesting data or generating sort of reports. Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent ought to take actions in a dynamic environment in order to maximize the cumulative reward. ~Wikipedia page. ...

Tabular Reinforcement Learning

This note extends the content Markov Processes in this specific context. One nice expansion, which treats the field a little bit more from the behavioural sciences perspectiv eis Intrinsic Motivation and Playfulness. Standard notions Explore-exploit dilemma We have seen something similar also in Active Learning when we tried to model if we wanted to look elsewhere or go for the maximum value we have found. The dilemma under analysis is the explore-exploit dilemma: whether if we should just go for the best solution we have found at the moment, or look for a better one. This also has implications in many other fields, also in normal human life there are a lot of balances in these terms. ...

Cache Optimization

Locality principles Remember the two locality principles in Memoria. temporal locality and spatial locality. Temporal Locality Some elements just are accessed many times in time. This is an example of a temporal locality. Spatial locality Some elements are accessed close to each other, this is an idea of spatial locality. In modern architectures, the a line of cache is usually 64 bytes. For example consider this snippet: sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Sum is an example of temporal locality as the same memory location (or register) is accessed many times, and the access of the array a is an example of spatial locality. loops cycle through the same instructions, this is an example of temporal locality. ...