Notes

Recurrent Neural Networks

Recurrent Neural Networks allows us to model arbitrarily long sequence dependencies, at least in theory (this is also why they seem a very nice choice in theory for time series). This is very handy, and has many interesting theoretical implication. But here we are also…

Backpropagation

Backpropagation is perhaps the most important algorithm of the 21st century. It is used everywhere in machine learning and is also connected to computing marginal distributions. This is why all machine learning scientists and data scientists should understand this algorithm very…

Probabilistic Parsing

Language Constituents # A constituent is a word or a group of words that function as a single unit within a hierarchical structure This is because there is a lot of evidence pointing towards an hierarchical organization of human language. Example of constituents # Let's have…

Semirings

Semirings allow us to generalize many many common operations. One of the most powerful usages is the algebraic view of dynamic programming. Definition of a semiring # A semiring is a 5-tuple R = ( A , ⊕ , ⊗ , 0 ˉ , 1 ˉ ) such that. ( A , ⊕ , 0 ˉ ) is a commutative monoid ( A , ⊗…

Transliteration systems

This note is still a TODO. Transliteration is learning learning a function to map strings in one character set to strings in another character set. The basic example is in multilingual applications, where it is needed to have the same string written in different languages. The…

Sentiment Analysis

Sentiment analysis is one of the oldest tasks in natural language processing. In this note we will introduce some examples and terminology, some key problems in the field and a simple model that we can understand by just knowing Backpropagation Log Linear Models and the Softmax…

Log Linear Models

Log Linear Models can be considered the most basic model used in natural languages. The main idea is to try to model the correlations of our data, or how the posterior p ( y ∣ x ) varies, where x is our single data point features and y are the labels of interest. This is a form…

Language Models

In order to understand language models we need to understand structured prediction . If you are familiar with Sentiment Analysis , where given an input text we need to classify it in a binary manner, in this case the output space usually scales in an exponential manner. The…

Part of Speech Tagging

What is a part of Speech? # A part of speech (POS) is a category of words that display similar syntactic behavior , i.e., they play similar roles within the grammatical structure of sentences. It has been known since the Latin era that some categories of words behave similarly…

Softmax Function

Softmax is one of the most important functions for neural networks. It also has some interesting properties that we list here. This function is part of The Exponential Family , one can also see that the sigmoid function is a particular case of this softmax, just two variables.…