Firing-rate based Network models

The Potassium Exchange values We use the measurement by Cole and Curthis 40mS/cm squared was their measure of Potassium ions leaving the membrane $$ \Delta Q = Idt = GA \Delta E dt $$The potassium concentration is 0.155 moles per litre. Where $G$ is the conductance per unit area, $A$ the membrane surface, $E$ voltage deflection Remember that the conductance is the reciprocal of the resistance, and $V = IR \implies I = \frac{V}{R} = GV$ ...

5 min · Xuanqiang 'Angelo' Huang

Introduction to Neural Sytems

What is a neural system? A neural system is an intricately organized network of specialized cells—primarily neurons, along with a variety of supportive glial cells—that processes and transmits information via electrical and chemical signals. In biological organisms, such systems underpin the entire nervous system, coordinating functions that range from basic reflexes to the complex interplay of perception, thought, and behavior. Early studies in neurobiology revealed that even simple neural circuits can generate coordinated responses, while modern neuroscience has shown that vast, hierarchically structured networks (such as the central and peripheral nervous systems) are responsible for the rich tapestry of animal behavior and cognition ...

2 min · Xuanqiang 'Angelo' Huang

The Neuron

Some history: Reticular Theory vs Neuron Doctrine The late 19th century witnessed a debate in neuroscience between Camillo Golgi and Santiago Ramón y Cajal, two pioneers whose opposing views shaped our understanding of the nervous system. This debate centered on the structural and functional organization of neurons, culminating in their joint reception of the 1906 Nobel Prize in Physiology or Medicine. Golgi’s Reticular Theory Golgi proposed the Reticular Theory based on his staining techniques (see #Staining methods), which held that: ...

9 min · Xuanqiang 'Angelo' Huang

Apache Spark

This is a new framework that is faster than MapReduce (See Massive Parallel Processing). It is written in Scala and has a more functional approach to programming. Spark extends the previous MapReduce framework to a generic distributed dataflow, properly modeled as a DAG. There are other benefits of using Spark instead of the Map reduce Framework: Spark processes data in memory, avoiding the disk I/O overhead of MapReduce, making it significantly faster. Spark uses a DAG to optimize the entire workflow, reducing data shuffling and stage count. But MapReduce sometimes has its advantages: ...

9 min · Xuanqiang 'Angelo' Huang

Bayesian Information Criterion

Bayesian Information Criterion (BIC) The Bayesian Information Criterion (BIC) is a model selection criterion that helps compare different statistical models while penalizing model complexity. It is rooted in Bayesian probability theory but is commonly used even in frequentist settings. Mathematically Precise Definition For a statistical model $M$ with $k$ parameters fitted to a dataset $\mathcal{D} = \{x_1, x_2, \dots, x_n\}$, the BIC is defined as: $$ \text{BIC} = -2 \cdot \ln \hat{L} + k \cdot \ln(n) $$where: ...

3 min · Xuanqiang 'Angelo' Huang

Beta and Dirichlet Distributions

The beta distribution The beta distribution is a powerful tool for modeling probabilities and proportions between 0 and 1. Here’s a structured intuition to grasp its essence: Core Concept The beta distribution, defined on $[0, 1]$, is parameterized by two shape parameters: α (alpha) and β (beta). These parameters dictate the distribution’s shape, allowing it to flexibly represent beliefs about probabilities, rates, or proportions. Key Intuitions a. “Pseudo-Counts” Interpretation α acts like “successes” and β like “failures” in a hypothetical experiment. Example: If you use Beta(5, 3), it’s as if you’ve observed 5 successes and 3 failures before seeing actual data. After observing x real successes and y real failures, the posterior becomes Beta(α+x, β+y). This makes beta the conjugate prior for the binomial distribution (bernoulli process). b. Shape Flexibility Uniform distribution: When α = β = 1, all values in [0, 1] are equally likely. Bell-shaped: When α, β > 1, the distribution peaks at mode = (α-1)/(α+β-2). Symmetric if α = β (e.g., Beta(5, 5) is centered at 0.5). U-shaped: When α, β < 1, density spikes at 0 and 1 (useful for modeling polarization, meaning we believe the model to only produce values at 0 or 1, not in the middle.). Skewed: If α > β, skewed toward 1; if β > α, skewed toward 0. c. Moments Mean: $α/(α+β)$ – your “expected” probability of success. Variance: $αβ / [(α+β)²(α+β+1)]$ – decreases as α and β grow (more confidence). $$ \text{Mode} = \frac{\alpha - 1}{\alpha + \beta - 2} $$The mathematical model $$ \text{Beta} (x \mid a, b) = \frac{1}{B(a, b)} \cdot x^{a -1 }(1 - x)^{b - 1} $$ Where $B(a, b) = \Gamma(a) \Gamma(b) / \Gamma( + b)$ And $\Gamma(t) = \int_{0}^{\infty}e^{-x}x^{t - 1} \, dx$ ...

4 min · Xuanqiang 'Angelo' Huang

Counterfactual Invariance

Machine learning cannot distinguish between causal and environment features. Shortcut learning Often we observe shortcut learning: the model learns some dataset dependent shortcuts (e.g. the machine that was used to take the X-ray) to make inference, but this is very brittle, and is not usually able to generalize. Shortcut learning happens when there are correlations in the test set between causal and non-causal features. Our object of interest should be the main focus, not the environment around, in most of the cases. For example, a camel in a grass land should still be recognized as a camel, not a cow. One solution could be engineering invariant representations which are independent of the environment. So having a kind of encoder that creates these representations. ...

9 min · Xuanqiang 'Angelo' Huang

Cross Validation and Model Selection

There is a big difference between the empirical score and the expected score; in the beginning, we had said something about this in Introduction to Advanced Machine Learning. We will develop more methods to better comprehend this fundamental principles. How can we estimate the expected risk of a particular estimator or algorithm? We can use the cross-validation method. This method is used to estimate the expected risk of a model, and it is a fundamental method in machine learning. ...

5 min · Xuanqiang 'Angelo' Huang

Data Cubes

Data Cubes is a data format especially useful for heavy reads. It has been popularized in business environments where the main use for data was to make reports (many reads). This also links with the OLAP (Online Analytical Processing) vs OLTP (Online Transaction Processing) concepts, where one is optimized for reads and the other for writes. The main driver behind data cubes was business intelligence. While traditional relational database systems are focused on the day-to-day business of a company and record keeping (with customers placing or- ders, inventories kept up to date, etc), business intelligence is focused on the production of high-level reports for supporting C-level executives in making informed decisions. ...

4 min · Xuanqiang 'Angelo' Huang

Data Models and Validation

A data model is an abstract view over the data that hides the way it is stored physically. The same idea from (Codd 1970) This is why we should not modify data directly, but pass though some abstraction that maintain the properties of that specific data model. Data Models Tree view We can view all JSON and XML data, as presented in Markup, as trees. This structure is usually quite evident, as it is inherent in their design. Converting from the tree structure to a memory model is known as serialization, while the reverse process is called parsing. ...

10 min · Xuanqiang 'Angelo' Huang