Backpropagation

Backpropagation is perhaps the most important algorithm of the 21st century. It is used everywhere in machine learning and is also connected to computing marginal distributions. This is why all machine learning scientists and data scientists should understand this algorithm very well. An important observation is that this algorithm is linear: the time complexity is the same as the forward pass. Derivatives are unexpectedly cheap to calculate. This took a lot of time to discover....

8 min Â· Xuanqiang 'Angelo' Huang

Semirings

Semirings allow us to generalize many many common operations. One of the most powerful usages is the algebraic view of dynamic programming. Definition of a semiring A semiring is a 5-tuple $R = (A, \oplus, \otimes, \bar{0}, \bar{1})$ such that. $(A, \oplus, \bar{0})$ is a commutative monoid $(A, \otimes, \bar{1})$ is a monoid $\otimes$ distributes over $\oplus$. $\bar{0}$ is annihilator for $\otimes$. Monoid Let $K, \oplus$ be a set and a operation, then:...

4 min Â· Xuanqiang 'Angelo' Huang

The Exponential Family

This is the generalization of the family of function where Softmax Function belongs. Many many functions are part of this family, most of the distributions that are used in science are part of the exponential family, e.g. beta, Gaussian, Bernoulli, Categorical distribution, Gamma, Beta, Poisson, are all part of the exponential family. The useful thing is the generalization power of this set of functions: if you prove something about this family, you prove it for every distribution that is part of this family....

6 min Â· Xuanqiang 'Angelo' Huang

Language Models

In order to understand language models we need to understand structured prediction. If you are familiar with Sentiment Analysis, where given an input text we need to classify it in a binary manner, in this case the output space usually scales in an exponential manner. The output has some structure, for example it could be a tree, it could be a set of words etc… This usually needs an intersection between statistics and computer science....

2 min Â· Xuanqiang 'Angelo' Huang

Bag of words

Bag of words only takes into account the count of the words inside a document, ignoring all the syntax and boundaries. This method is very common for email classifications techniques. We can say bag of words can be some sort of pooling, it’s similar to the computer vision analogue. It’s difficult to say what is the best method (also a reason why people say NLP is difficult to teach). Introduction to bag of words Faremo una introduzione di applicazione di Naïve Bayes applicato alla classificazione di documenti....

2 min Â· Xuanqiang 'Angelo' Huang