Anomaly Detection

Anomaly detection is a problem in machine learning that is of a big interest in industry. For example a bank needs to identify problems in transactions, doctors need it to see illness, or suspicious behaviors for law (no Orwell here). The main difference between this and classification is that here we have no classes. Setting of the problem Let’s say we have a set $X = \left\{ x_{1}, \dots, x_{n} \right\} \subseteq \mathcal{N} \subseteq \mathcal{X} = \mathbb{R}^{d}$ We say this set is the normal set, and $X$ are our samples but it’s quite complex, so we need an approximation to say whether if a set is normal or not. We need a function $\phi : \mathcal{X} \to \left\{ 0, 1 \right\}$ with $\phi(x) = 1 \iff x \not \in \mathcal{N}$. ...

2 min · Xuanqiang 'Angelo' Huang

Bayesian neural networks

Robbins-Moro Algorithm The Algorithm $$ w_{n+1} = w_{n} - \alpha_{n} \Delta w_{n} $$For example with $\alpha_{0} > \alpha_{1} > \dots > \alpha_{n} \dots$, and $\alpha_{t} = \frac{1}{t}$ they satisfy the condition (in practice we use a constant $\alpha$, but we lose the convergence guarantee by Robbins Moro). More generally, the Robbins-Moro conditions re: $\sum_{n} \alpha_{n} = \infty$ $\sum_{n} \alpha_{n}^{2} < \infty$ Then the algorithm is guaranteed to converge to the best answer. One nice thing about this, is that we don’t need gradients. But often we use gradient versions (stochastic gradient descent and similar), using auto-grad, see Backpropagation. But learning with gradients brings some drawbacks: ...

10 min · Xuanqiang 'Angelo' Huang

Architecture of the Brain

First, the brain is organized into functionally specific areas, and second, neurons in different parts of the vertebrate nervous system, indeed in all nervous systems, are quite similar. Small comparison with Computers A gross observation between computer’s transistors and human neurons is that there a big difference of numbers: trillions of transistors vs billions of neurons. 6 orders of magnitude frequency difference. Many many neural types and different types of connections. And the digital vs analog and chemical modes of communication. Parallel processor abilities. Fixed vs plastic architectures But this is comparing with transistors with one higher level object, so this comparison might not be completely fair. And only some brain areas are similar to real neural networks ...

7 min · Xuanqiang 'Angelo' Huang

Backpropagation

Backpropagation is perhaps the most important algorithm of the 21st century. It is used everywhere in machine learning and is also connected to computing marginal distributions. This is why all machine learning scientists and data scientists should understand this algorithm very well. An important observation is that this algorithm is linear: the time complexity is the same as the forward pass. Derivatives are unexpectedly cheap to calculate. This took a lot of time to discover. See colah’s blog. Karpathy has a nice resource for this topic too! ...

7 min · Xuanqiang 'Angelo' Huang

Memory in Human Brain

Here we attempt to answer what is memory, how is it stored and retrieved. Memory is a process by which information is: Encoded Stored Retrieved The brain has different types of memories, and certain brain regions are specialized for this task. Ebbinghaus Curves Other experiments destroy parts of the cortex and correlate this with recall. Types of memory TODO see Kendal67-1 figure. Sensory memory iconic memory (remembering images) 150-500 milliseconds Echoic memory (recognizing some sounds) usually retained for 1 to 2 seconds. This memory is filtered by consciousness/attention to be passed to short term working memory. The register capacity of this memory is considered to be quite large. Short-term memory it has an explicit storage of about 7 +- 2 items (so very small). Depending on attention level, it is retained for 2 to 18 seconds. It seems the representation here is often vocal. ...

7 min · Xuanqiang 'Angelo' Huang

Cluster Management Policies

Introduction to cluster management How can we allocate the resources in a cluster in an efficient manner? How can we allocate resources fairly? Two step allocations 🟨++ There are two main kinds of allocation: first you need to allocate resources to a process, then allocate the process physically in the cluster. Private and public cluster management 🟥++ Cluster management could be private or public. Private means every app is managing their own sub-cluster: each app receives a private, static set of resources. Here it is easier to manage hardware for various needs. Public means there is a big cluster, like standard third party ...

4 min · Xuanqiang 'Angelo' Huang

Generative Adversarial Networks

0 min · Xuanqiang 'Angelo' Huang

Memoria

4.1 Caratteristiche della Memoria La gerarchia della memoria, più si va giù più spazio si ha, più è lento il caricamento delle informazioni 4.1.1 Catalogazione della memoria Le tipologie di memoria sono presenti a fianco. In generale più la memoria è veloce da riprendere, più è costosa da memorizzare (c’è poco spazio) 4.1.2 Byte e Word Il libro a pagina 74 parte con la discussione del perché si è preferito evitare la BCD (Binary coded decimal, in cui i numeri da 0 a 9 erano codificato da 4 bit), per questioni di efficienza. ...

9 min · Xuanqiang 'Angelo' Huang

Notazione Asintotica

Introduzione alla notazione asintotica Cercare di definire il tempo impiegato da una funzione per essere eseguita in termini di DIMENSIONE dell’input. **(il numero di bit a livello basso basso) Ma abbiamo il problema di misura, in quanto dobbiamo considerare delle variabili che siano indipendenti rispetto alla macchina. Caratteristiche della notazione Vogliamo considerare una notazione asintotica (che guarda quanto fa il comportamento verso l’infinito) ### Accesso di memoria Ogni operazione in un processore moderno ha in generale un numero di accessi in memoria constante (solitamente abbiamo sempre un numero fissato di operandi possibile, questo significa che se un certo algoritmo ha una certa complessità, resta di questa complessità anche tenendo in considerazione le operazioni di accesso di memoria). Questo discorso non tiene più se teniamo in considerazione numeri a precisione infinita, che possono avere un numero arbitrario di accessi in memoria per poter essere computato. ...

4 min · Xuanqiang 'Angelo' Huang

The Perceptron Model

The perceptron is a fundamental binary linear classifier introduced by (Rosenblatt 1958). It maps an input vector $\mathbf{x} \in \mathbb{R}^n$ to an output $y \in \{0,1\}$ using a weighted sum followed by a threshold function. The Mathematical Definition Given an input vector $\mathbf{x} = (x_1, x_2, \dots, x_n)$ and a weight vector $\mathbf{w} = (w_1, w_2, \dots, w_n)$, the perceptron computes: $$ z = \mathbf{w}^\top \mathbf{x} + b = \sum_{i=1}^{n} w_i x_i + b $$where $b$ is the bias term. The output is determined by the Heaviside step function: ...

3 min · Xuanqiang 'Angelo' Huang