Processing math: 100%

Advanced 3D Representations

3D representations In this section, we present some of the most common 3D representations used in computer graphics and computer vision. Each representation has its own advantages and disadvantages, and the choice of representation often depends on the specific application. Voxels With voxels we discretize 3D space into a 3d grid, it is an intuitive manner to represent the data, but it has limited resolution. It needs O(n3) memory. Points and Volumetric primitives We can discretize surfaces into 3D points. Yet, this does not model connectivity, and might vary from frame to frame if it is a video. ...

12 min · Xuanqiang 'Angelo' Huang

Convolutional Neural Network

Introduction to Convolutional NN Design Goals We want to be invariant to some transformations but also at the same time to be specific to some thing. Convolutional Neural Networks (CNNs) are a class of deep neural networks that are particularly effective for image processing tasks. They are designed to automatically and adaptively learn spatial hierarchies of features from images. Compared to standard Fully connected Neural Networks, they reuse weights, making their number of parameter much fewer. ...

13 min · Xuanqiang 'Angelo' Huang

Neural Networks

Introduction: a neuron I am lazy, so I’m skipping the introduction for this set of notes. Look at Andrew Ng’s Coursera course for this part (here are the notes). Historical paper is (Rosenblatt 1958). One can view a perceptron to be a Log Linear Models with the temperature of the softmax that goes to 0 (so that it is an argmax). Trained with a stochastic gradient descent with a batch of 1 (this is called the perceptron update rule, see The Perceptron Model). ...

8 min · Xuanqiang 'Angelo' Huang

Normalizing Flows

Normalizing flows have both latent space and can produce tractable explicit probability distributions (closer to Autoregressive Modelling, they have tractable distributions, but not a latent space). This means we are able to get the likelihoods of a certain sample. This approach to modelling a flexible distribution is called a normalizing flow because the transformation of a probability distribution through a sequence of mappings is somewhat analogous to the flow of a fluid. From (Bishop & Bishop 2024) ...

10 min · Xuanqiang 'Angelo' Huang

Parametric Human Body Models

An historical perspective The origins of motion capture One of the earliest starts of motion capturing is the famous horse in 1878 in motion “video”. This was the start of all the modern cameras. One of the earliest human body motion capture was in military for moving efficiency purposes in 1883. This website has many historical resources on the topic. The problem is still a problem in modern times. If we want to create models to mimic humans, it surely could be nice to understand how humans move and think. This is the general line of though of this line of research. ...

12 min · Xuanqiang 'Angelo' Huang

Generative Adversarial Networks

Generative Adversarial Network has been introduced in 2014 by Ian Goodfellow (at that time they where still gray and white). Now the images have been improved with Diffusion Models, that can be considered the new paradigm. This idea has been considered by Yann LeCun as one of the most important ideas. Nowadays (2025), they are still used for super-resolution and other applications, but it has still some limitations (mainly stability), and now has good competition against other models. The resolution purported by GAN is much higher than VAE (see Autoencoders#Variational Autoencoders). This is a easy plugin to improve the results of other models (VAE, flow, Diffusion). Also ChatGPT has some sort of adversarial learning for example, not explained in the same manner as here. ...

14 min · Xuanqiang 'Angelo' Huang

Autoregressive Modelling

On Autoregressivity The main idea of autoregressivity is to use previous prediction to predict the next state. The Autoregressive property Autoregressive models model a joint distribution of aleatoric variables by assuming a chain rule like decomposition: p(x)=ni=1p(xi|x1:i1) If we assume independence between the variables, we don’t need many variables to model it 2T, but this assumption is too strong. If we just use a tabular approach, we’ll have a combinatorial explosion: we will have about 2T1 possible states (if we assume the aleatoric variables are binary, and we are creating a table for each intermediate variable). ...

2 min · Xuanqiang 'Angelo' Huang

Recurrent Neural Networks

Recurrent Neural Networks allows us to model arbitrarily long sequence dependencies, at least in theory (this is also why they seem a very nice choice in theory for time series). This is very handy, and has many interesting theoretical implication. But here we are also interested in the practical applicability, so we may need to analyze common architectures used to implement these models, the main limitation and drawbacks, the nice properties and some applications. ...

6 min · Xuanqiang 'Angelo' Huang

Transformers

Transformers, introduced in NLP language translation in (Vaswani et al. 2017), are one of the cornerstones of modern deep learning. For this reason, it is quite important to understand how they are done. Introduction to Transformers Transformers are called in this manner because they transform the input data space into another with the same dimensionality. The goal of the transformation is that the new space will have a richer internal representation that is better suited to solving downstream tasks. (Bishop & Bishop 2024) ...

9 min · Xuanqiang 'Angelo' Huang

Egocentric Vision

Egocentric vision is a sub-field of computer vision that studies vision understanding from a centered point of view, that typical of animals. One historical thing is MIT 1997 they had to bring around very heavy cameras. Now we have glasses. Other examples of egocentric vision are cars with cameras that see their surrounding, or robots equipped with cameras mimicking human vision. The difference of egocentric vision compared to standard vision techniques is the high variability and instability of the video, and the concept of movement and interactions inside the image. Standard computer vision is disembodied and controlled field of view. ...

7 min · Xuanqiang 'Angelo' Huang