Probabilistic Parsing

Language Constituents A constituent is a word or a group of words that function as a single unit within a hierarchical structure This is because there is a lot of evidence pointing towards an hierarchical organization of human language. Example of constituents Let’s have some examples: John speaks [Spanish] fluently John speaks [Spanish and French] fluently Mary programs the homework [in the ETH computer laboratory] Mary programs the homework [in the laboratory] ...

5 min · Xuanqiang 'Angelo' Huang

Provably Approximately Correct Learning

PAC Learning is one of the most famous theories in learning theory. Learning theory concerns in answering questions like: What is learnable? Somewhat akin to La macchina di Turing for computability theory. How well can you learn something? PAC is a framework that allows to formally answer these questions. Now there is also a bayesian version of PAC in which there is a lot of research. Some definitions Empirical Risk Minimizer and Errors $$ \arg \min_{\hat{c} \in \mathcal{H}} \hat{R}_{n}(\hat{c}) $$ Where the inside is the empirical error. ...

11 min · Xuanqiang 'Angelo' Huang

Querying Denormalized Data

TODO: write the introduction to the note. JSONiq purports as an easy query language that could run everywhere. It attempts to solve common problems in SQL i.e. the lack of support for nested data structures and also the lack of support for JSON data types. A nice thing about JSONiq is that it is functional, which makes its queries quite powerful and flexible. It is also declarative and set-based. These are some commonalities with SQL. ...

6 min · Xuanqiang 'Angelo' Huang

Rademacher Complexity

This note used the definitions present in Provably Approximately Correct Learning. So, go there when you encounter a word you don’t know. Or search online Rademacher Complexity $$ \mathcal{G} = \left\{ g : (x, y) \to L(h(x), y) : h \in \mathcal{H} \right\} $$ Where $L : \mathcal{Y} \times \mathcal{Y} \to \mathbb{R}$ is a generic loss function. The Rademacher complexity captures the richness of a family of functions by measuring the degree to which a hypothesis set can fit random noise. From (Mohri et al. 2012). ...

1 min · Xuanqiang 'Angelo' Huang

RL Function Approximation

These algorithms are good for scaling state spaces, but not actions spaces. The Gradient Idea Recall Temporal difference learning and Q-Learning, two model free policy evaluation techniques explored in Tabular Reinforcement Learning. A simple parametrization 🟩 The idea here is to parametrize the value estimation function so that similar inputs gets similar values akin to Parametric Modeling estimation we have done in the other courses. In this manner, we don’t need to explicitly explore every single state in the state space. ...

14 min · Xuanqiang 'Angelo' Huang

Semirings

Semirings allow us to generalize many many common operations. One of the most powerful usages is the algebraic view of dynamic programming. Definition of a semiring A semiring is a 5-tuple $R = (A, \oplus, \otimes, \bar{0}, \bar{1})$ such that. $(A, \oplus, \bar{0})$ is a commutative monoid $(A, \otimes, \bar{1})$ is a monoid $\otimes$ distributes over $\oplus$. $\bar{0}$ is annihilator for $\otimes$. Monoid Let $K, \oplus$ be a set and a operation, then: ...

3 min · Xuanqiang 'Angelo' Huang

Sentiment Analysis

Sentiment analysis is one of the oldest tasks in natural language processing. In this note we will introduce some examples and terminology, some key problems in the field and a simple model that we can understand by just knowing Backpropagation Log Linear Models and the Softmax Function. We say: Polarity: the orientation of the sentiment. Subjectivity: if it expresses personal feelings. See demo Some applications: Businesses use sentiment analysis to understand if users are happy or not with their product. It’s linked to revenue: if the reviews are good, usually you make more money. But companies can’t read every review, so they want automatic methods. ...

2 min · Xuanqiang 'Angelo' Huang

Softmax Function

Softmax is one of the most important functions for neural networks. It also has some interesting properties that we list here. This function is part of The Exponential Family, one can also see that the sigmoid function is a particular case of this softmax, just two variables. Sometimes this could be seen as a relaxation of the action potential inspired by neuroscience (See The Neuron for a little bit more about neurons). This is because we need differentiable, for gradient descent. The action potential is an all or nothing thing. ...

3 min · Xuanqiang 'Angelo' Huang

Support Vector Machines

This is a quite good resource about this part of Support Vector Machines (step by step derivation). (Bishop 2006) chapter 7 is a good resource. The main idea about this supervised method is separating with a large gap. The thing is that we have a hyperplane, when this plane is projected to lower dimensional data, it can look like a non-linear separator. After we have found this separator, we can intuitively have an idea of confidence based on the distance of the separator. ...

9 min · Xuanqiang 'Angelo' Huang

Tabular Reinforcement Learning

This note extends the content Markov Processes in this specific context. Standard notions Explore-exploit dilemma 🟩 We have seen something similar also in Active Learning when we tried to model if we wanted to look elsewhere or go for the maximum value we have found. The dilemma under analysis is the explore-exploit dilemma: whether if we should just go for the best solution we have found at the moment, or look for a better one. This also has implications in many other fields, also in normal human life there are a lot of balances in these terms. ...

12 min · Xuanqiang 'Angelo' Huang