Massive Parallel Processing

We have a group of mappers that work on dividing the keys for some reducers that actually work on that same group of data. The bottleneck is the assigning part: when mappers finish and need to handle the data to the reducers. Introduction Common input formats 🟨 You need to know well what Shards Textual input binary, parquet and similars CSV and similars Sharding It is a common practice to divide a big dataset into chunks (or shards), smaller parts which recomposed give the original dataset....

8 min Â· Xuanqiang 'Angelo' Huang

Network Address Translation

NAT Network address translation Introduzione Col il NAT possiamo avere tutto lo spazio degli IP di cui abbiamo bisogno, che però non sono esposti. All’esterno vengono esposte solamente l’IP del NAT. Schema classico NAT Quindi in breve All’esterno è esposto solamente l’indirizzo del router, il router, a seconda della porta giusta, dà in risposta al computer giusto, quindi all’interno della nostra rete conosciamo tutti gli indirizzi IP giusti. Addr translation table 🟩 Sembra che ad ogni richiesta ci sia una table di transizione all’interno del router che matcha porta → indirizzo locale corretto!...

2 min Â· Xuanqiang 'Angelo' Huang

Neural Networks

Introduction: a neuron I am lazy, so I’m skipping the introduction for this set of notes. Look at Andrew Ng’s Coursera course for this part. Historical notes are (Rosenblatt 1958). One can view a perceptron to be a Log Linear Models with the temperature of the softmax that goes to 0 (so that it is an argmax). Trained with a stochastic gradient descent with a batch of 1 (this is called the perceptron update rule)....

6 min Â· Xuanqiang 'Angelo' Huang

Tabular Reinforcement Learning

This note extends the content Markov Processes in this specific context. Standard notions Explore-exploit dilemma đźź© We have seen something similar also in Active Learning when we tried to model if we wanted to look elsewhere or go for the maximum value we have found. The dilemma under analysis is the explore-exploit dilemma: whether if we should just go for the best solution we have found at the moment, or look for a better one....

12 min Â· Xuanqiang 'Angelo' Huang

Counterfactual Invariance

Machine learning cannot distinguish between causal and environment features. Shortcut learning Often we observe shortcut learning: the model learns some dataset dependent shortcuts (e.g. the machine that was used to take the X-ray) to make inference, but this is very brittle, and is not usually able to generalize. Shortcut learning happens when there are correlations in the test set between causal and non-causal features. Our object of interest should be the main focus, not the environment around, in most of the cases....

10 min Â· Xuanqiang 'Angelo' Huang