Notes

Multi Variable Derivatives

Multi-variable derivative To the people that are not used to matrix derivatives (like me) it could be useful to see how $$ \frac{ \partial u^{T}Su }{ \partial u } = 2Su $$ First, we note that if you derive with respect to some matrix, the output will be of the same dimension of that matrix. That notation is just deriving every single component independently and then joining them together, so it will be better understood as as $$ \frac{ \partial u^{T}Su }{ \partial u } = \begin{bmatrix} \frac{ \partial u^{T}Su }{ \partial u_{1} } \ \dots \ \frac{ \partial u^{T}Su }{ \partial u_{M} } \ \end{bmatrix} $$ So we can prove each derivative independently, it's just a lot of manual work! We see that $u^{T}Su$ is just a quadratic form, studied in Massimi minimi multi-variabile#Forme quadratiche so it is just computing this: $$ u^{T}Su = \sum_{i, j = 1, 1}^{M} u_{i}u_{j}S_{ij} \implies \frac{ \partial u^{T}Su }{ \partial u_{1} } =2u_{1}S_{11} + \sum_{j \neq 1}^{M}(u_{j}S_{1j} + u_{j}S_{j1}) = 2\left( u_{1}S_{11} + \sum_{j \neq 1}u_{j}S_{1j} \right) = 2(Su)_{1} $$ Last equation is true because $S$ is a symmetric matrix, then we easily see that indeed it’s true that indeed it’s the first row of the $Su$ matrix multiplied by 2. ...

Naïve Bayes

Introduzione a Naïve Bayes NOTE: this note should be reviewed after the course I took in NLP. This is a very old note, not even well written. Bisognerebbe in primo momento avere benissimo in mente il significato di probabilità condizionata e la regola di naive Bayes in seguito. Bayes ad alto livello Da un punto di vista intuitivo non è altro che predire la cosa che abbiamo visto più spesso in quello spazio ...

Variational Inference

$$ p(\theta \mid x_{1:n}, y_{1:n}) = \frac{1}{z} p(y_{1:n} \mid \theta, x_{1:n}) p(\theta \mid x_{1:n}) \approx q(\theta \mid \lambda) $$For Bayesian Linear Regression we had high dimensional Gaussians which made the inference closed form, in general this is not true, so we need some kinds of approximation. Laplace approximation Introduction to the Idea $$ \psi(\theta) \approx \hat{\psi}(\theta) = \psi(\hat{\theta}) + (\theta-\hat{\theta} ) ^{T} \nabla \psi(\hat{\theta}) + \frac{1}{2} (\theta-\hat{\theta} ) ^{T} H_{\psi}(\hat{\theta})(\theta-\hat{\theta} ) = \psi(\hat{\theta}) + \frac{1}{2} (\theta-\hat{\theta} ) ^{T} H_{\psi}(\hat{\theta})(\theta-\hat{\theta} ) $$ We simplified the term on the first order because we are considering the mode, so the gradient should be zero for the stationary point. ...

Asymmetric Cryptography

Public Key Encryption We now define a formally what is a public key encryption Formal definition of Public Key Encryption We define a 3-tuple formed as follows: $(G, E, D)$ where $G$ is the generator for the private and public keys, from now on identified as $(pk, sk)$ (public key and secret key) $E(pk, m)$ the encryption algorithm, that takes the $pk$ and the message in input $D(sk, c)$ the decryption algorithm, that takes the $sk$ and the ciphertext in input. Now is this definition useful? i don’t think so! We can’t create theorems for it, too general I suppose. Is it clear? yes! I think this is the usefulness of maths in many occasions, it delivers some complex information in a concise and understandable manner. ...

Spazi vettoriali

Spazi vettoriali 1.1 Piano cartesiano 1.1.1 Definizione Possiamo considerare il piano cartesiano come l’insieme $\R^2$ potremmo dire che esiste una corrispondenza fra una coordinata e un punto del piano, una volta che abbiamo definito un punto di origine. Si può vedere anche come corrispondenza biunivoca con vettori del piano per l’origine (parte dall’origine). Questa cosa vale anche per uno spazio n-dimensionale, non soltanto due, ma per semplicità di introduzione di questo lo faccio con 2 ...

Bayesian Linear Regression

We have a prior $p(\text{model})$, we have a posterior $p(\text{model} \mid \text{data})$, a likelihood $p(\text{data} \mid \text{model})$ and $p(\text{data})$ is called the evidence. Classical Linear regression $$ y = w^{T}x + \varepsilon $$ Where $\varepsilon \sim \mathcal{N}(0, \sigma_{n}^{2}I)$ and it’s the irreducible noise, an error that cannot be eliminated by any model in the model class, this is also called aleatoric uncertainty. One could write this as follows: $y \sim \mathcal{N}(w^{T}x, \sigma^{2}_{n}I)$ and it’s the exact same thing as the previous, so if we look for the MLE estimate now we get ...

Scheduler

Il suo scopo principale è gestire l’avvicendamento dei processi. Ad esempio sospendere il processo che chiede I/O. O un sistema time sharing, quando arriva un interrupt sul time. Solitamente il nome scheduler è solamente un gestore dell’avvicendamento, si può quindi utilizzare per indicare scheduler di altro tipo. Note introduttive Diagramma di Gantt Questo è il diagramma per presentare lo scheduling, ossia da quando a quando è eseguito cosa Esempio gantt ...

Introduzione SO

Scopi del sistema operativo Un sistema operativo è una astrazione sul HW che permette di Gestire l’esecuzione di più programmi assieme (concorrenza), tramite virtualizzazione CPU e Memoria Gestire le risorse (Quindi I/O, RAM, Memoria, Networking) Fornisce una interfaccia di programmazione (API) molto più generale e potente, in grado di astrarre da dettagli di livello basso, vicini all’Hardware (come device drivers). Quindi in breve il SO è n programma che crea un ambiente civile per i programmi in cui interagire, e facilita molto il lavoro al programmatore per la sua interfaccia nuova. (si potrebbe dire che sia una macchina virtuale con un suo linguaggio (che è l’API) se seguiamo la terminologia di Macchine Astratte) ...

Livello ISA

il livello isa è il livello delle istruzioni 8.1 Struttura Potremmo definire l’architettura di un elaboratore come tutte le parti del processore che una persona abbia bisogno di sapere per scrivere codice assembly. Istruzioni possibili Registri Solitamente le istruzioni sono divise in due parti: 8.1.1 Opcode e indirizzamento Opcode Questo opcode indica la tipologia di istruzione. Per esempio per l’architettura HACK è il primo bit, che indica se è una istruzione C oppure una istruzione A. ...

Memoria

4.1 Caratteristiche della Memoria La gerarchia della memoria, più si va giù più spazio si ha, più è lento il caricamento delle informazioni 4.1.1 Catalogazione della memoria Le tipologie di memoria sono presenti a fianco. In generale più la memoria è veloce da riprendere, più è costosa da memorizzare (c’è poco spazio) 4.1.2 Byte e Word Il libro a pagina 74 parte con la discussione del perché si è preferito evitare la BCD (Binary coded decimal, in cui i numeri da 0 a 9 erano codificato da 4 bit), per questioni di efficienza. ...