Document Stores

p> Document stores provide a native database management system for semi-structured data. Document stores also scale to Gigabytes or Terabytes of data, and typically millions or billions of records (a record being a JSON object or an XML document). Introduction to Document Stores A document store, unlike a data lake, manages the data directly and the users do not see the physical layout. Unlike data lakes, using document stores prevent us from breaking data independence and reading the data file directly: it offers an automatic manager service for semi-structured data that we need to throw and read quickly. ...

6 min · Xuanqiang 'Angelo' Huang

Ensemble Methods

The idea of ensemble methods goes back to Sir Francis Galton. In 787, he noted that although not every single person got the right value, the average estimate of a crowd of people predicted quite well. The main idea of ensemble methods is to combine relatively weak classifiers into a highly accurate predictor. The motivation for boosting was a procedure that combines the outputs of many “weak” classifiers to produce a powerful “committee.” ...

6 min · Xuanqiang 'Angelo' Huang

Fisher's Linear Discriminant

A simple motivation Fisher’s Linear Discriminant is a simple idea used to linearly classify our data. The image above, taken from (Bishop 2006), is the summary of the idea. We clearly see that if we first project using the direction of maximum variance (See Principal Component Analysis) then the data is not linearly separable, but if we take other notions into consideration, then the idea becomes much more cleaner. ...

4 min · Xuanqiang 'Angelo' Huang

Gaussian Processes

Gaussian processes can be viewed through a Bayesian lens of the function space: rather than sampling over individual data points, we are now sampling over entire functions. They extend the idea of bayesian linear regression by introducing an infinite number of feature functions for the input XXX. In geostatistics, Gaussian processes are referred to as kriging regressions, and many other models, such as Kalman Filters or radial basis function networks, can be understood as special cases of Gaussian processes. In this framework, certain functions are more likely than others, and we aim to model this probability distribution. ...

8 min · Xuanqiang 'Angelo' Huang

Graph Databases

We have first cited the graph data model in the Introduction to Big Data note. Until now, we have explored many aspects of relational data bases, but now we are changing the data model completely. The main reason driving this discussion are the limitations of classical relational databases: queries like traversal of a high number of relationships, reverse traversal requiring also indexing foreign keys (need double index! Index only work in one direction for relationship traversal, i.e. if you need both direction you should build an index both for the forward key and backward key), looking for patterns in the relationships, are especially expensive when using normal databases. We have improved over the problem of joining with relational database using Document Stores with three data structure, but these cannot have cycles. We call index-free adjacency: we use physical memory pointers to store the graph. ...

7 min · Xuanqiang 'Angelo' Huang

Introduction to Advanced Machine Learning

Introduction to the course Machine learning offers a new way of thinking about reality: rather than attempting to directly capture a fragment of reality, as many traditional sciences have done, we elevate to the meta-level and strive to create an automated method for capturing it. This first lesson will be more philosophical in nature. We are witnessing a paradigm shift in the sense described by Thomas Kuhn in his theory of scientific revolutions. But what drives such a shift, and how does it unfold? ...

13 min · Xuanqiang 'Angelo' Huang

Introduction to Big Data

Data Science is similar to physics: it attemps to create theories of realities based on some formalism that another science brings. For physics it was mathematics, for data science it is computer science. Data has grown expeditiously in these last years and has reached a distance that in metres is the distance to Jupiter. The galaxy is in the order of magnitude of 400 Yottametres, which has $3 \cdot 8$ zeros following after it. So quite a lot. We don’t know if the magnitude of the data will grow this fast but certainly we need to be able to face this case. ...

10 min · Xuanqiang 'Angelo' Huang

Introduction to Natural Language Processing

The landscape of NLP was very different in the beginning of the field. “But it must be recognized that the notion ‘probability of a sentence’ is an entirely useless one, under any known interpretation of this term 1968 p 53. Noam Chomsky. Probability was not seen very well (Chomsky has said many wrong things indeed), and linguists were considered useless. Recently deep learning and computational papers are ubiquitous in major conferences in linguistics, e.g. ACL. ...

2 min · Xuanqiang 'Angelo' Huang

Kalman Filters

Here is a historical treatment on the topic: https://jwmi.github.io/ASM/6-KalmanFilter.pdf. Kalman Filters are defined as follows: We start with a variable $X_{0} \sim \mathcal{N}(\mu, \Sigma)$, then we have a motion model and a sensor model: $$ \begin{cases} X_{t + 1} = FX_{t} + \varepsilon_{t} & F \in \mathbb{R}^{d\times d}, \varepsilon_{t} \sim \mathcal{N}(0, \Sigma_{x})\\ Y_{t} = HX_{t} + \eta_{t} & H \in \mathbb{R}^{m \times d}, \eta_{t} \sim \mathcal{N}(0, \Sigma_{y}) \end{cases} $$Inference is just doing things with the Gaussians. One can interpret the $Y$ to be the observations and $X$ to be the underlying beliefs about a certain state. We see that the Kalman Filters satisfy the Markov Property, see Markov Chains. These independence properties allow a easy characterization of the joint distribution for Kalman Filters: ...

3 min · Xuanqiang 'Angelo' Huang

Kernel Methods

As we will briefly see, Kernels will have an important role in many machine learning applications. In this note we will get to know what are Kernels and why are they useful. Intuitively they measure the similarity between two input points. So if they are close the kernel should be big, else it should be small. We briefly state the requirements of a Kernel, then we will argue with a simple example why they are useful. ...

9 min · Xuanqiang 'Angelo' Huang