Neural Networks
Introduction: a neuron I am lazy, so I’m skipping the introduction for this set of notes. Look at Andrew Ng’s Coursera course for this part. Historical notes are (Rosenblatt 1958). One can view a perceptron to be a Log Linear Models with the temperature of the softmax that goes to 0 (so that it is an argmax). Trained with a stochastic gradient descent with a batch of 1 (this is called the perceptron update rule)....