Parametric Modeling

In this note we will first talk about briefly some of the main differences of the three main approaches regarding statistics: the bayesian, the frequentist and the statistical learning methods and then present the concept of the estimator, compare how the approaches differ from…

January 13, 2025 · Reading Time: 18 minutes · By Xuanqiang Angelo Huang

Gaussians

Gaussians are one of the most important family of probability distributions. They arise naturally in the law of large numbers and have some nice properties that we will briefly present and prove here in this note. They are also quite common for Gaussian Processes and the…

Bayesian neural networks

Robbins-Moro Algorithm # The Algorithm # the algorithm is very simple we do the following until convergence: set some learning rates that satisfy the Robbins Moro Conditions, choose a w 0 ​ then update in the following way: w n + 1 ​ = w n ​ − α n ​ Δ w n ​ For example with α 0…

Normalizzazione dei database

Introduzione alla normalizzazione # Perché si normalizza? # Cercare di aumentare la qualità del nostro database, perché praticamente andiamo a risolvere delle anomalie possibili al nostro interno, e questo aiuta per la qualità. Solitamente queste anomalie sono interessanti per…

Diffusion Models

Diffusion is a physical process that models random motion, first analyzed by Brown when studying pollen grains in water. In this section, we will first analyze a simplified 1-dimensional version, and then delve into diffusion models for images, the ones closest to (Ho et al.…

January 5, 2025 · Reading Time: 18 minutes · By Xuanqiang Angelo Huang

Document Stores

p> Document stores provide a native database management system for semi-structured data. Document stores also scale to Gigabytes or Terabytes of data, and typically millions or billions of records (a record being a JSON object or an XML document). Introduction to Document Stores…

January 2, 2025 · Reading Time: 6 minutes · By Xuanqiang Angelo Huang

Ensemble Methods

The idea of ensemble methods goes back to Sir Francis Galton. In 787, he noted that although not every single person got the right value, the average estimate of a crowd of people predicted quite well. The main idea of ensemble methods is to combine relatively weak classifiers…

January 1, 2025 · Reading Time: 10 minutes · By Xuanqiang Angelo Huang

Linear Regression methods

We will present some methods related to regression methods for data analysis. Some of the work here is from (Hastie et al. 2009) . This note does not treat the bayesian case, you should see Bayesian Linear Regression for that. Problem setting # In usual regression problems we…

December 30, 2024 · Reading Time: 16 minutes · By Xuanqiang Angelo Huang

Fisher's Linear Discriminant

A simple motivation # Fisher's Linear Discriminant is a simple idea used to linearly classify our data. The image above, taken from (Bishop 2006) , is the summary of the idea. We clearly see that if we first project using the direction of maximum variance (See Principal…

December 28, 2024 · Reading Time: 4 minutes · By Xuanqiang Angelo Huang

Wide Column Storage

Introduction to Wide Column Storages # One of the bottlenecks of traditional relational databases is the speed of the Joints, which could be done in O ( n ) using a merge join, assuming some indexes are present which make the keys already sorted. The other solution, of just…

December 28, 2024 · Reading Time: 9 minutes · By Xuanqiang Angelo Huang