Notes

Bayesian Information Criterion

This note is one of the few notes that was generated with the help of chatgpt. Bayesian Information Criterion (BIC) # The Bayesian Information Criterion (BIC) is a model selection criterion that helps compare different statistical models while penalizing model complexity. It is…

February 2, 2025 · Reading Time: 4 minutes · By Xuanqiang Angelo Huang

Beta and Dirichlet Distributions

The beta distribution # The beta distribution is a powerful tool for modeling probabilities and proportions between 0 and 1. Here's a structured intuition to grasp its essence: Core Concept # The beta distribution, defined on [ 0 , 1 ] , is parameterized by two shape parameters:…

February 1, 2025 · Reading Time: 6 minutes · By Xuanqiang Angelo Huang

Counterfactual Invariance

Machine learning cannot distinguish between causal and environment features. Shortcut learning # Often we observe shortcut learning : the model learns some dataset dependent shortcuts (e.g. the machine that was used to take the X-ray) to make inference, but this is very brittle,…

January 18, 2025 · Reading Time: 13 minutes · By Xuanqiang Angelo Huang

Variational Inference

With variational inference we want to find a good approximation of the posterior distribution from which it is easy to sample. The objective is to approximate the posterior with a simpler one, because sometimes the prior or likelihood are difficult to compute. p ( θ ∣ x 1 : n ​…

Parametric Modeling

In this note we will first talk about briefly some of the main differences of the three main approaches regarding statistics: the bayesian, the frequentist and the statistical learning methods and then present the concept of the estimator, compare how the approaches differ from…

January 13, 2025 · Reading Time: 18 minutes · By Xuanqiang Angelo Huang

Bayesian neural networks

Robbins-Moro Algorithm # The Algorithm # the algorithm is very simple we do the following until convergence: set some learning rates that satisfy the Robbins Moro Conditions, choose a w 0 ​ then update in the following way: w n + 1 ​ = w n ​ − α n ​ Δ w n ​ For example with α 0…

Diffusion Models

Diffusion is a physical process that models random motion, first analyzed by Brown when studying pollen grains in water. In this section, we will first analyze a simplified 1-dimensional version, and then delve into diffusion models for images, the ones closest to (Ho et al.…

January 5, 2025 · Reading Time: 18 minutes · By Xuanqiang Angelo Huang

Ensemble Methods

The idea of ensemble methods goes back to Sir Francis Galton. In 787, he noted that although not every single person got the right value, the average estimate of a crowd of people predicted quite well. The main idea of ensemble methods is to combine relatively weak classifiers…

January 1, 2025 · Reading Time: 10 minutes · By Xuanqiang Angelo Huang

Linear Regression methods

We will present some methods related to regression methods for data analysis. Some of the work here is from (Hastie et al. 2009) . This note does not treat the bayesian case, you should see Bayesian Linear Regression for that. Problem setting # In usual regression problems we…

December 30, 2024 · Reading Time: 16 minutes · By Xuanqiang Angelo Huang

Fisher's Linear Discriminant

A simple motivation # Fisher's Linear Discriminant is a simple idea used to linearly classify our data. The image above, taken from (Bishop 2006) , is the summary of the idea. We clearly see that if we first project using the direction of maximum variance (See Principal…

December 28, 2024 · Reading Time: 4 minutes · By Xuanqiang Angelo Huang