Gaussian Processes

Gaussian processes can be viewed through a Bayesian lens of the function space: rather than sampling over individual data points, we are now sampling over entire functions. They extend the idea of bayesian linear regression by introducing an infinite number of feature functions for the input XXX. In geostatistics, Gaussian processes are referred to as kriging regressions, and many other models, such as Kalman Filters or radial basis function networks, can be understood as special cases of Gaussian processes. In this framework, certain functions are more likely than others, and we aim to model this probability distribution. ...

8 min · Xuanqiang 'Angelo' Huang

Introduction to Advanced Machine Learning

Introduction to the course Machine learning offers a new way of thinking about reality: rather than attempting to directly capture a fragment of reality, as many traditional sciences have done, we elevate to the meta-level and strive to create an automated method for capturing it. This first lesson will be more philosophical in nature. We are witnessing a paradigm shift in the sense described by Thomas Kuhn in his theory of scientific revolutions. But what drives such a shift, and how does it unfold? ...

13 min · Xuanqiang 'Angelo' Huang

Kernel Methods

As we will briefly see, Kernels will have an important role in many machine learning applications. In this note we will get to know what are Kernels and why are they useful. Intuitively they measure the similarity between two input points. So if they are close the kernel should be big, else it should be small. We briefly state the requirements of a Kernel, then we will argue with a simple example why they are useful. ...

9 min · Xuanqiang 'Angelo' Huang

Linear Regression methods

We will present some methods related to regression methods for data analysis. Some of the work here is from (Hastie et al. 2009). This note does not treat the bayesian case, you should see Bayesian Linear Regression for that. Problem setting $$ Y = \beta_{0} + \sum_{j = 1}^{d} X_{j}\beta_{j} $$We usually don’t know the distribution of $P(X)$ or $P(Y \mid X)$ so we need to assume something about these distributions. One approach is assuming the distribution $Y \mid X \sim \mathcal{N}(f(X), \sigma^{2}I)$ and then solve the log likelihood on the probability If we use a statistical learning approach then we know we want to minimize this $\arg \min_{f} \sum_{i = 1}^{n} (y_{i} - f(x_{i}))^{2}$ which is what is often used. Both methods end with the same solution. Usually for these kind of problems we use the Least Squares Method, initially studied in Minimi quadrati. We will describe it again better in this setting. Let’s consider the linear model ...

9 min · Xuanqiang 'Angelo' Huang

Provably Approximately Correct Learning

PAC Learning is one of the most famous theories in learning theory. Learning theory concerns in answering questions like: What is learnable? Somewhat akin to La macchina di Turing for computability theory. How well can you learn something? PAC is a framework that allows to formally answer these questions. Now there is also a bayesian version of PAC in which there is a lot of research. Some definitions Empirical Risk Minimizer and Errors $$ \arg \min_{\hat{c} \in \mathcal{H}} \hat{R}_{n}(\hat{c}) $$ Where the inside is the empirical error. ...

11 min · Xuanqiang 'Angelo' Huang

Rademacher Complexity

This note used the definitions present in Provably Approximately Correct Learning. So, go there when you encounter a word you don’t know. Or search online Rademacher Complexity $$ \mathcal{G} = \left\{ g : (x, y) \to L(h(x), y) : h \in \mathcal{H} \right\} $$ Where $L : \mathcal{Y} \times \mathcal{Y} \to \mathbb{R}$ is a generic loss function. The Rademacher complexity captures the richness of a family of functions by measuring the degree to which a hypothesis set can fit random noise. From (Mohri et al. 2012). ...

1 min · Xuanqiang 'Angelo' Huang

Softmax Function

Softmax is one of the most important functions for neural networks. It also has some interesting properties that we list here. This function is part of The Exponential Family, one can also see that the sigmoid function is a particular case of this softmax, just two variables. Sometimes this could be seen as a relaxation of the action potential inspired by neuroscience (See The Neuron for a little bit more about neurons). This is because we need differentiable, for gradient descent. The action potential is an all or nothing thing. ...

3 min · Xuanqiang 'Angelo' Huang

Support Vector Machines

This is a quite good resource about this part of Support Vector Machines (step by step derivation). (Bishop 2006) chapter 7 is a good resource. The main idea about this supervised method is separating with a large gap. The thing is that we have a hyperplane, when this plane is projected to lower dimensional data, it can look like a non-linear separator. After we have found this separator, we can intuitively have an idea of confidence based on the distance of the separator. ...

9 min · Xuanqiang 'Angelo' Huang

Lagrange Multipliers

This is also known as Lagrange Optimization or undetermined multipliers. Some of these notes are based on Appendix E of (Bishop 2006), others were found when studying bits of rational mechanics. Also (Boyd & Vandenberghe 2004) chapter 5 should be a good resource on this topic. $$ \begin{array} \\ \min f_{0}(x) \\ \text{subject to } f_{i}(x) \leq 0 \\ h_{j}(x) = 0 \end{array} $$Lagrangian function $$ \mathcal{L}(x, \lambda, \nu) = f_{0}(x) + \sum \lambda_{i}f_{i}(x) + \sum\nu_{j}h_{j}(x) $$ We want to say something about this function, because it is able to simplify the optimization problem a lot, but first we want to study this mathematically. ...

6 min · Xuanqiang 'Angelo' Huang

Object detection and Segmentation

Definition of problems Object detection Bisogna trovare all’interno dell’immagine quali siano gli oggetti presenti, e in più vogliamo sapere dove siano quindi utilizzare una bounding box per caratterizzarli sarebbe buono. Object segmentation È riuscire a caratterizzare categoria per categoria per singoli pixelsm e per questo motivo potrei riuscire a fare delle image map in cui colorare singoli oggetti in una categoria. Datasets Example datasets Pascal VOC 2012 Coco datasets Cityscapes dataset Autogenerated datasets But I don’t know much about these datasets Applications Auto drive Campo medico (per segmentazione medica o riconoscimento immagini). reidentificazione. Key posse extimations. U-net Il primo skip connection ci permette di capire bene quali siano i bordi, perché sappiamo che la convoluzione riesce a prendere bene ...

3 min · Xuanqiang 'Angelo' Huang