Machine-Perception

Advanced 3D Representations

3D representations In this section, we present some of the most common 3D representations used in computer graphics and computer vision. Each representation has its own advantages and disadvantages, and the choice of representation often depends on the specific application. Voxels With voxels we discretize 3D space into a 3d grid, it is an intuitive manner to represent the data, but it has limited resolution. It needs $\mathcal{O}(n^{3})$ memory. Points and Volumetric primitives We can discretize surfaces into 3D points. Yet, this does not model connectivity, and might vary from frame to frame if it is a video. ...

Autoencoders

In questa serie di appunti proviamo a descrivere tutto quello che sappiamo al meglio riguardanti gli autoencoders Blog di riferimento Blog secondario che sembra buono Introduzione agli autoencoders L’idea degli autoencoders è rappresentare la stessa cosa attraverso uno spazio minore, in un certo senso è la compressione con loss. Per cosa intendiamo qualunque tipologia di dato, che può spaziare fra immagini, video, testi, musica e simili. Qualunque cosa che noi possiamo rappresentare in modo digitale possiamo costruirci un autoencoder. Una volta scelta una tipologia di dato, come per gli algoritmi di compressione, valutiamo come buono il modello che riesce a comprimere in modo efficiente e decomprimere in modo fedele rispetto all’originale. Abbiamo quindi un trade-off fra spazio latente, che è lo spazio in cui sono presenti gli elementi compressi, e la qualità della ricostruzione. Possiamo infatti osservare che se spazio latente = spazio originale, loss di ricostruzione = 0 perché basta imparare l’identità. In questo senso si può dire che diventa sensato solo quando lo spazio originale sia minore di qualche fattore rispetto all’originale. Quando si ha questo, abbiamo più difficoltà di ricostruzione, e c’è una leggera perdita in questo senso. ...

Autoregressive Modelling

On Autoregressivity The main idea of autoregressivity is to use previous prediction to predict the next state. The Autoregressive property Autoregressive models model a joint distribution of aleatoric variables by assuming a chain rule like decomposition: $$ p(x) = \prod_{i=1}^{n} p(x_i | x_{1:i-1}) $$ If we assume independence between the variables, we don’t need many variables to model it $2T$, but this assumption is too strong. If we just use a tabular approach, we’ll have a combinatorial explosion: we will have about $2^{T - 1}$ possible states (if we assume the aleatoric variables are binary, and we are creating a table for each intermediate variable). ...

Backpropagation

Backpropagation is perhaps the most important algorithm of the 21st century. It is used everywhere in machine learning and is also connected to computing marginal distributions. This is why all machine learning scientists and data scientists should understand this algorithm very well. An important observation is that this algorithm is linear: the time complexity is the same as the forward pass. Derivatives are unexpectedly cheap to calculate. This took a lot of time to discover. See colah’s blog. Karpathy has a nice resource for this topic too! Stanford lecture on backpropagation is another resource. ...

Convolutional Neural Network

Introduction to Convolutional NN Design Goals We want to be invariant to some transformations but also at the same time to be specific to some thing. Convolutional Neural Networks (CNNs) are a class of deep neural networks that are particularly effective for image processing tasks. They are designed to automatically and adaptively learn spatial hierarchies of features from images. Compared to standard Fully connected Neural Networks, they reuse weights, making their number of parameter much fewer. ...

Egocentric Vision

Egocentric vision is a sub-field of computer vision that studies vision understanding from a centered point of view, that typical of animals. One historical thing is MIT 1997 they had to bring around very heavy cameras. Now we have glasses. Other examples of egocentric vision are cars with cameras that see their surrounding, or robots equipped with cameras mimicking human vision. The difference of egocentric vision compared to standard vision techniques is the high variability and instability of the video, and the concept of movement and interactions inside the image. Standard computer vision is disembodied and controlled field of view. ...

Generative Adversarial Networks

Generative Adversarial Network has been introduced in 2014 by Ian Goodfellow (at that time they where still gray and white). Now the images have been improved with Diffusion Models, that can be considered the new paradigm. This idea has been considered by Yann LeCun as one of the most important ideas. Nowadays (2025), they are still used for super-resolution and other applications, but it has still some limitations (mainly stability), and now has good competition against other models. The resolution purported by GAN is much higher than VAE (see Autoencoders#Variational Autoencoders). This is a easy plugin to improve the results of other models (VAE, flow, Diffusion). Also ChatGPT has some sort of adversarial learning for example, not explained in the same manner as here. ...

Neural Networks

Introduction: a neuron I am lazy, so I’m skipping the introduction for this set of notes. Look at Andrew Ng’s Coursera course for this part (here are the notes). Historical paper is (Rosenblatt 1958). One can view a perceptron to be a Log Linear Models with the temperature of the softmax that goes to 0 (so that it is an argmax). Trained with a stochastic gradient descent with a batch of 1 (this is called the perceptron update rule, see The Perceptron Model). ...

Normalizing Flows

Normalizing flows have both latent space and can produce tractable explicit probability distributions (closer to Autoregressive Modelling, they have tractable distributions, but not a latent space). This means we are able to get the likelihoods of a certain sample. This approach to modelling a flexible distribution is called a normalizing flow because the transformation of a probability distribution through a sequence of mappings is somewhat analogous to the flow of a fluid. From (Bishop & Bishop 2024) ...

Parametric Human Body Models

An historical perspective The origins of motion capture One of the earliest starts of motion capturing is the famous horse in 1878 in motion “video”. This was the start of all the modern cameras. One of the earliest human body motion capture was in military for moving efficiency purposes in 1883. This website has many historical resources on the topic. The problem is still a problem in modern times. If we want to create models to mimic humans, it surely could be nice to understand how humans move and think. This is the general line of though of this line of research. ...