Monte Carlo Methods

DI Law of Large Numbers e Central limit theorem ne parliamo in Central Limit Theorem and Law of Large Numbers. Usually these methods are useful when you need to calculate following something similar to Bayes rule, but don’t know how to calculate the denominator, often infeasible integral. We estimate this value without explicitly calculating that. Interested in $\mathbb{P}(x) = \frac{1}{z} \mathbb{P}^{*}(x) = \frac{1}{Z} e^{-E(x)}$ Can evaluate E(x) at any x. Problem 1 Make samples x(r) ~ 2 P Problem 2 Estimate expectations $\Phi = \sum_{x}\phi(x)\mathbb{P}(x)$) What we’re not trying to do: We’re not trying to find the most probable state. We’re not trying to visit all typical states. Law of large numbers $$ S_{n} = \sum^n_{i=1} x_{i} ,:, \bar{x}_{n} = \frac{S_{n}}{n} $$$$ \bar{x}_{n} \to \mu $$ Ossia il limite converge sul valore atteso di tutte le variabili aleatorie. ...

7 min · Xuanqiang 'Angelo' Huang

Analysis of Neural Codes

Metodi di registrazione informazione Ci stiamo chiedendo in che modo possiamo registrare attività del cervello e quindi cercare di fare decoding delle informazioni presenti Prima parliamo di alcune tecniche non invasive che ci permettono di vedere alcune attività presenti nel cervello. Metodi macroscopici Functional Magnetic Resonance Imaging Un metodo è fMRI. (ci sono cose ) TODO capire come funziona Electro-Encephalo-Gram EEG che prende direttamente dai segnali Ma il drawback di entrambi è che non registrano attività del singolo array. ...

2 min · Xuanqiang 'Angelo' Huang

Firing-rate based Network models

The Potassium Exchange values We use the measurement by Cole and Curthis 40mS/cm squared was their measure of Potassium ions leaving the membrane $$ \Delta Q = Idt = GA \Delta E dt $$The potassium concentration is 0.155 moles per litre. Where $G$ is the conductance per unit area, $A$ the membrane surface, $E$ voltage deflection Remember that the conductance is the reciprocal of the resistance, and $V = IR \implies I = \frac{V}{R} = GV$ ...

5 min · Xuanqiang 'Angelo' Huang

Introduction to Neural Sytems

What is a neural system? A neural system is an intricately organized network of specialized cells—primarily neurons, along with a variety of supportive glial cells—that processes and transmits information via electrical and chemical signals. In biological organisms, such systems underpin the entire nervous system, coordinating functions that range from basic reflexes to the complex interplay of perception, thought, and behavior. Early studies in neurobiology revealed that even simple neural circuits can generate coordinated responses, while modern neuroscience has shown that vast, hierarchically structured networks (such as the central and peripheral nervous systems) are responsible for the rich tapestry of animal behavior and cognition ...

2 min · Xuanqiang 'Angelo' Huang

The Neuron

Some history: Reticular Theory vs Neuron Doctrine The late 19th century witnessed a debate in neuroscience between Camillo Golgi and Santiago Ramón y Cajal, two pioneers whose opposing views shaped our understanding of the nervous system. This debate centered on the structural and functional organization of neurons, culminating in their joint reception of the 1906 Nobel Prize in Physiology or Medicine. Golgi’s Reticular Theory Golgi proposed the Reticular Theory based on his staining techniques (see #Staining methods), which held that: ...

9 min · Xuanqiang 'Angelo' Huang

Apache Spark

This is a new framework that is faster than MapReduce (See Massive Parallel Processing). It is written in Scala and has a more functional approach to programming. Spark extends the previous MapReduce framework to a generic distributed dataflow, properly modeled as a DAG. There are other benefits of using Spark instead of the Map reduce Framework: Spark processes data in memory, avoiding the disk I/O overhead of MapReduce, making it significantly faster. Spark uses a DAG to optimize the entire workflow, reducing data shuffling and stage count. But MapReduce sometimes has its advantages: ...

9 min · Xuanqiang 'Angelo' Huang

Bayesian Information Criterion

Bayesian Information Criterion (BIC) The Bayesian Information Criterion (BIC) is a model selection criterion that helps compare different statistical models while penalizing model complexity. It is rooted in Bayesian probability theory but is commonly used even in frequentist settings. Mathematically Precise Definition For a statistical model $M$ with $k$ parameters fitted to a dataset $\mathcal{D} = \{x_1, x_2, \dots, x_n\}$, the BIC is defined as: $$ \text{BIC} = -2 \cdot \ln \hat{L} + k \cdot \ln(n) $$where: ...

3 min · Xuanqiang 'Angelo' Huang

Bayesian Optimization

While Active Learning looks for the most informative points to recover a true underlying function, Bayesian Optimization is just interested to find the maximum of that function. In Bayesian Optimization, we ask for the best way to find sequentially a set of points $x_{1}, \dots, x_{n}$ to find $\max_{x \in \mathcal{X}} f(x)$ for a certain unknown function $f$. This is what the whole thing is about. Definitions First we will introduce some useful definitions in this context. These were also somewhat introduced in N-Bandit Problem, which is one of the classical optimization problems we can find in the literature. ...

8 min · Xuanqiang 'Angelo' Huang

Beta and Dirichlet Distributions

The beta distribution The beta distribution is a powerful tool for modeling probabilities and proportions between 0 and 1. Here’s a structured intuition to grasp its essence: Core Concept The beta distribution, defined on $[0, 1]$, is parameterized by two shape parameters: α (alpha) and β (beta). These parameters dictate the distribution’s shape, allowing it to flexibly represent beliefs about probabilities, rates, or proportions. Key Intuitions a. “Pseudo-Counts” Interpretation α acts like “successes” and β like “failures” in a hypothetical experiment. Example: If you use Beta(5, 3), it’s as if you’ve observed 5 successes and 3 failures before seeing actual data. After observing x real successes and y real failures, the posterior becomes Beta(α+x, β+y). This makes beta the conjugate prior for the binomial distribution (bernoulli process). b. Shape Flexibility Uniform distribution: When α = β = 1, all values in [0, 1] are equally likely. Bell-shaped: When α, β > 1, the distribution peaks at mode = (α-1)/(α+β-2). Symmetric if α = β (e.g., Beta(5, 5) is centered at 0.5). U-shaped: When α, β < 1, density spikes at 0 and 1 (useful for modeling polarization, meaning we believe the model to only produce values at 0 or 1, not in the middle.). Skewed: If α > β, skewed toward 1; if β > α, skewed toward 0. c. Moments Mean: $α/(α+β)$ – your “expected” probability of success. Variance: $αβ / [(α+β)²(α+β+1)]$ – decreases as α and β grow (more confidence). $$ \text{Mode} = \frac{\alpha - 1}{\alpha + \beta - 2} $$The mathematical model $$ \text{Beta} (x \mid a, b) = \frac{1}{B(a, b)} \cdot x^{a -1 }(1 - x)^{b - 1} $$ Where $B(a, b) = \Gamma(a) \Gamma(b) / \Gamma( + b)$ And $\Gamma(t) = \int_{0}^{\infty}e^{-x}x^{t - 1} \, dx$ ...

4 min · Xuanqiang 'Angelo' Huang

Counterfactual Invariance

Machine learning cannot distinguish between causal and environment features. Shortcut learning Often we observe shortcut learning: the model learns some dataset dependent shortcuts (e.g. the machine that was used to take the X-ray) to make inference, but this is very brittle, and is not usually able to generalize. Shortcut learning happens when there are correlations in the test set between causal and non-causal features. Our object of interest should be the main focus, not the environment around, in most of the cases. For example, a camel in a grass land should still be recognized as a camel, not a cow. One solution could be engineering invariant representations which are independent of the environment. So having a kind of encoder that creates these representations. ...

9 min · Xuanqiang 'Angelo' Huang