Backpropagation

Backpropagation is perhaps the most important algorithm of the 21st century. It is used everywhere in machine learning and is also connected to computing marginal distributions. This is why all machine learning scientists and data scientists should understand this algorithm very well. An important observation is that this algorithm is linear: the time complexity is the same as the forward pass. Derivatives are unexpectedly cheap to calculate. This took a lot of time to discover. See colah’s blog. Karpathy has a nice resource for this topic too! ...

7 min · Xuanqiang 'Angelo' Huang

Dependency Parsing

This set of note is still in TODO Dependency Grammar has been much bigger in Europe compared to USA, where Chomsky’s grammars ruled. One of the main developers of this theory is Lucien Tesnière (1959): “The sentence is an organized whole, the constituent elements of which are words. Every word that belongs to a sentence ceases by itself to be isolated as in the dictionary. Between the word and its neighbors, the mind perceives connections, the totality of which forms the structure of the sentence. The structural connections establish dependency relations between the words. Each connection in principle unites a superior term and an inferior term. The superior term receives the name governor (head). The inferior term receives the name subordinate (dependent).” ~Lucien Tesnière ...

4 min · Xuanqiang 'Angelo' Huang

Language Models

In order to understand language models we need to understand structured prediction. If you are familiar with Sentiment Analysis, where given an input text we need to classify it in a binary manner, in this case the output space usually scales in an exponential manner. The output has some structure, for example it could be a tree, it could be a set of words etc… This usually needs an intersection between statistics and computer science. ...

2 min · Xuanqiang 'Angelo' Huang

Log Linear Models

Log Linear Models can be considered the most basic model used in natural languages. The main idea is to try to model the correlations of our data, or how the posterior $p(y \mid x)$ varies, where $x$ is our single data point features and $y$ are the labels of interest. This is a form of generalization because contextualized events (x, y) with similar descriptions tend to have similar probabilities. These kinds of models are so common that it has been discovered in many fields (and thus assuming different names): some of the most famous are Gibbs distributions, undirected graphical models, Markov Random Fields or Conditional Random Fields, exponential models, and (regularized) maximum entropy models. Special cases include logistic regression and Boltzmann machines. ...

5 min · Xuanqiang 'Angelo' Huang

Part of Speech Tagging

What is a part of Speech? A part of speech (POS) is a category of words that display similar syntactic behavior, i.e., they play similar roles within the grammatical structure of sentences. It has been known since the Latin era that some categories of words behave similarly (verbs for declination for example). The intuitive take is that knowing a specific part of speech can help understand the meaning of the sentence. ...

5 min · Xuanqiang 'Angelo' Huang

Recurrent Neural Networks

Recurrent Neural Networks allows us to model arbitrarily long sequence dependencies, at least in theory. This is very handy, and has many interesting theoretical implication. But here we are also interested in the practical applicability, so we may need to analyze common architectures used to implement these models, the main limitation and drawbacks, the nice properties and some applications. These network can bee seen as chaotic systems (non-linear dynamical systems), see Introduction to Chaos Theory. ...

4 min · Xuanqiang 'Angelo' Huang

Semirings

Semirings allow us to generalize many many common operations. One of the most powerful usages is the algebraic view of dynamic programming. Definition of a semiring A semiring is a 5-tuple $R = (A, \oplus, \otimes, \bar{0}, \bar{1})$ such that. $(A, \oplus, \bar{0})$ is a commutative monoid $(A, \otimes, \bar{1})$ is a monoid $\otimes$ distributes over $\oplus$. $\bar{0}$ is annihilator for $\otimes$. Monoid Let $K, \oplus$ be a set and a operation, then: ...

3 min · Xuanqiang 'Angelo' Huang

Sentiment Analysis

Sentiment analysis is one of the oldest tasks in natural language processing. In this note we will introduce some examples and terminology, some key problems in the field and a simple model that we can understand by just knowing Backpropagation Log Linear Models and the Softmax Function. We say: Polarity: the orientation of the sentiment. Subjectivity: if it expresses personal feelings. See demo Some applications: Businesses use sentiment analysis to understand if users are happy or not with their product. It’s linked to revenue: if the reviews are good, usually you make more money. But companies can’t read every review, so they want automatic methods. ...

2 min · Xuanqiang 'Angelo' Huang

The Exponential Family

This is the generalization of the family of function where Softmax Function belongs. Many many functions are part of this family, most of the distributions that are used in science are part of the exponential family, e.g. beta, Gaussian, Bernoulli, Categorical distribution, Gamma, Beta, Poisson, are all part of the exponential family. The useful thing is the generalization power of this set of functions: if you prove something about this family, you prove it for every distribution that is part of this family. This family of functions is also closely linked too Generalized Linear Models (GLMs). ...

6 min · Xuanqiang 'Angelo' Huang

Transliteration systems

This note is still a TODO. Transliteration is learning learning a function to map strings in one character set to strings in another character set. The basic example is in multilingual applications, where it is needed to have the same string written in different languages. The goal is to develop a probabilistic model that can map strings from input vocabulary $\Sigma$ to an output vocabulary $\Omega$. We will extend the concepts presented in Automi e Regexp for Finite state automata to a weighted version. You will also need knowledge from Descrizione linguaggio for definitions of alphabets and strings, Kleene Star operations. ...

4 min · Xuanqiang 'Angelo' Huang