Backpropagation

Backpropagation is perhaps the most important algorithm of the 21st century. It is used everywhere in machine learning and is also connected to computing marginal distributions. This is why all machine learning scientists and data scientists should understand this algorithm very well. An important observation is that this algorithm is linear: the time complexity is the same as the forward pass. Derivatives are unexpectedly cheap to calculate. This took a lot of time to discover. See colah’s blog. Karpathy has a nice resource for this topic too! ...

7 min · Xuanqiang 'Angelo' Huang

Bag of words

Bag of words only takes into account the count of the words inside a document, ignoring all the syntax and boundaries. This method is very common for email classifications techniques. We can say bag of words can be some sort of pooling, it’s similar to the computer vision analogue. It’s difficult to say what is the best method (also a reason why people say NLP is difficult to teach). Introduction to bag of words Faremo una introduzione di applicazione di Naïve Bayes applicato alla classificazione di documenti. ...

2 min · Xuanqiang 'Angelo' Huang

Dependency Parsing

This set of note is still in TODO Dependency Grammar has been much bigger in Europe compared to USA, where Chomsky’s grammars ruled. One of the main developers of this theory is Lucien Tesnière (1959): “The sentence is an organized whole, the constituent elements of which are words. Every word that belongs to a sentence ceases by itself to be isolated as in the dictionary. Between the word and its neighbors, the mind perceives connections, the totality of which forms the structure of the sentence. The structural connections establish dependency relations between the words. Each connection in principle unites a superior term and an inferior term. The superior term receives the name governor (head). The inferior term receives the name subordinate (dependent).” ~Lucien Tesnière ...

4 min · Xuanqiang 'Angelo' Huang

Introduction to Natural Language Processing

The landscape of NLP was very different in the beginning of the field. “But it must be recognized that the notion ‘probability of a sentence’ is an entirely useless one, under any known interpretation of this term 1968 p 53. Noam Chomsky. Probability was not seen very well (Chomsky has said many wrong things indeed), and linguists were considered useless. Recently deep learning and computational papers are ubiquitous in major conferences in linguistics, e.g. ACL. ...

2 min · Xuanqiang 'Angelo' Huang

Language Models

In order to understand language models we need to understand structured prediction. If you are familiar with Sentiment Analysis, where given an input text we need to classify it in a binary manner, in this case the output space usually scales in an exponential manner. The output has some structure, for example it could be a tree, it could be a set of words etc… This usually needs an intersection between statistics and computer science. ...

2 min · Xuanqiang 'Angelo' Huang