Fisher's Linear Discriminant

A simple motivation Fisher’s Linear Discriminant is a simple idea used to linearly classify our data. The image above, taken from (Bishop 2006), is the summary of the idea. We clearly see that if we first project using the direction of maximum variance (See Principal Component Analysis) then the data is not linearly separable, but if we take other notions into consideration, then the idea becomes much more cleaner. A first approach We want to maximize the distance from the centers, while minimizing the inter-class variance....

2 min · Xuanqiang 'Angelo' Huang

Support Vector Machines

This is a quite good resource about this part of Support Vector Machines (step by step derivation). (Bishop 2006) chapter 7 is a good resource. The main idea about this supervised method is separating with a large gap. The thing is that we have a hyperplane, when this plane is projected to lower dimensional data, it can look like a non-linear separator. After we have found this separator, we can intuitively have an idea of confidence based on the distance of the separator....

12 min · Xuanqiang 'Angelo' Huang

Tokenization

Introduction to tokenization Tokenization is the process of converting normal strings into small little pieces that could be fed into one of our models. It usually comes from a tradition in programming languages, as we can see in Automi e Regexp where we define a specific token to have a known pattern, usually recognized by regular expressions. There have been historically been many approaches to tokenization, let’s see a few:...

3 min · Xuanqiang 'Angelo' Huang

Gaussian Processes

Gaussian processes can be viewed through a Bayesian lens of the function space: rather than sampling over individual data points, we are now sampling over entire functions. They extend the idea of bayesian linear regression by introducing an infinite number of feature functions for the input XXX. In geostatistics, Gaussian processes are referred to as kriging regressions, and many other models, such as Kalman Filters or radial basis function networks, can be understood as special cases of Gaussian processes....

9 min · Xuanqiang 'Angelo' Huang

Principal Component Analysis

Principal Component Analysis is a technique used to reduce the dimensionality of a dataset. So we can view this as an algorithm that tries to compress the data while retaining some of the most important information. It is one of the earliest and most simple techniques in machine learning. Single Layer Autoencoders learn to approximate this kind of transformation. The main idea is to find the directions with the most variance in a dataset....

6 min · Xuanqiang 'Angelo' Huang