How are inputs made into motion? We analyze feedback systems in auditory systems in birds.
Motivation
Birds are very good at producing and reproducing songs by moving their vocal cords complexly (sensory motor learning). We want to learn how they learn and then extrapolate on human or other kinds of speech learning. Birds and Humans do not have much of a common ancestry (last one was fishes). 71% of the birds, both female and male birds sing, for Zebra finch it is a mating behaviour, so only male sing.
Parallelism between humans and Birds
If a baby does not receive any parent speech in some critical period, then it is not able to learn anymore.

See here
Humans and monkeys have common ancestors. Assuming live emerged only once on earth, all living beings have common ancestors. A phylogenetic tree or evolutionary tree is a branching diagram showing the evolutionary relationships among various biological species or other entities—their phylogeny —based upon similarities and differences in their physical or genetic characteristics. Only humans are believed to have cumulative culture. A prerequisite of cumulative culture seems to be the ability to imitate actions. Vocal learning is an example of action imitation. Both songbirds and mammals have the ability of vocal learning, but their common ancestor (reptiles) did not seem to have this ability. A homoplasy is a character shared by a set of species but not present in their common ancestor. A good example is the evolution of the eye which has originated independently in many different species. In evolutionary biology, convergent evolution is the process whereby organisms not closely related (not monophyletic), independently evolve similar traits as a result of having to adapt to similar environments or ecological niches.
Zebra Finches
Zebra Finch’s Song
Both Zebra Finch and humans cannot learn to sing or speak anymore if in the critical period you don’t have a tutor, or somebody that speaks to you. Birds sing naturally (other neuroscientists using mices, they need to train them to do something specific), and learns from the environment. Scientists have analyzed the development of Zebra Finch singing as they aged using its spectogram, and have seen after some months it has become quite similar to the Tutor’s song. Only male birds learn to sing for mating reasons.
- Introductory note
- Motif (repeated in the same way over and over again)
- It has some syllables with gaps in between.
Learning to Sing
During a first sensory learning phase, they store a memory (template) of tutor song, and during a subsequent sensorimotor learning phase they practice their own songs to arrive at a good copy of the template.
The following is a summary of the main idea of singing for Zebra Finches.
- Sensory acquisition: Young bird hears tutor song → encodes auditory features.
- Template formation: Neural circuits in auditory forebrain (e.g., NCM, HVC) store a representation.
- Motor practice: Bird produces its own song via motor pathway (HVC → RA → nXIIts → syrinx).
- Error detection: Auditory feedback compares self-generated song with the template.
- Adjustment: Synaptic plasticity updates motor commands to reduce mismatch.
Singing Patterns
Subsong:
- Like babbling in human babies.
- The bird makes lots of random sounds and different note sequences.
- Nothing is organized yet, it has a very high variability during this stage.
Plastic Song (practice)
- The bird hears itself and compares to the tutor song (that has been memorized with sensory acquisition).
- It starts adjusting notes and sequences, making them more consistent.
- Still not perfect: some variability remains.
Crystallized song (adult)
- By adulthood, the song becomes stable and stereotyped.
- Each note and sequence is reproducible every time.
- This final song is usually a good copy of the tutor song.
- Once crystallized, the song doesn’t change anymore.
Zebra Finch’s necessary conditions
If a bird doesn’t have a tutor in the first part of their life they don’t learn to sing properly (not stereotyped, and produce also different kinds of songs), they also need auditory feedback (if they cannot hear feedback, they cannot learn properly, e.g. if you make them deaf). This is sort of true also for humans, this is called critical period of vocal learning. Offsprings of isolated tutors are able to relearn after 3-4 generations the Zebra Finch stereotyped song. So there is some innate direction where they prefer to sing, but they cannot learn it if they don’t have good parenting.
- Isolated birds They learn to sing, but often they have some not stereotyped parts or odd syllables.
- Deafened: they produce odd syllables that are very different compared to the template
Song Learning Theory
Types of errors
Syntax errors: meaning the structure of the sentence in their language is wrong, for example ABC instead of ACB. Pitch Errors: they sing using different pitch
Serial Tutoring Paradigm
How can you elaborate a measure of vocal performance error, with respect for its semantics? For example if a baby is trying to say water, but says baba, how can you define such a measure? This measure should also work for birds. See (Lipkind et al. 2017).
This is where Serial Tutoring comes into place. Birds learn to do it in steps (as they need to learn both transitions source to target and target to source): For example first learn AAA, then ABABAB. They learn each syllable independently before being able to switch.
Learning singing patterns

In the image, you see a species of bird (Zebra Finch) that sings in regular patterns.
Suppose that initially, it knows how to sing in the pattern ABCABC, and you want to teach it ACBACB.
What happens is that it first learns to sing AC, then CB, and once it can perform BA, it is able to learn to sing ACBACB.
The above is very similar in human infants. It is possible to do something very similar also in humans. Infants first start to say syllables, and then couple them together. Duplications decrease in time., and variations are integrated in the child time.
Songbirds do not Bifurcate
If they have a pitch error, they chose to commit to only one pitch, and forget about the other.e
They solve this by learning to sing a song for one syllable pitch and a call for the other. When they fork, they choose the syllable that is closer to their change.
When they need to correct both pitch errors and sequence errors, they first fix the pitch and then fix the sequence.
- Calls = simple, innate, general communication
- Songs = complex, learned, reproductive communication
What can they learn
Juveniles match each syllable to the most spectrally similar sound in the target, regardless of its temporal position, resulting in unnecessary sequence errors, that they later try to correct (Lipkind et al. 2017)
They can learn sequence errors. They can also learn to correct pitch errors. They do not refurcate syllables (cannot learn b+ or b- but only one, but can add it into the song, they add the other into the call).
This is a Quadratic assignment problem, which is NP-hard (how they learn first, pitch or sequence error?). If you teach a sequence error and pitch error, the bird will:
- first learn to change pitch match
- Then change the position of the syllable.
- Pitch assignment is greedy (they choose the version that is closer)
- Every syllable greedily goes to the closest solution.
Greedy vocabulary learning problem = making error corrections one syllable at a time without planning for later syllables can lead to suboptimal final sequences.
Adult Birdsong Learning
Song can be operantly conditioned by delivering auditory white noise (WN) bursts contingent on pitch of a targeted syllable.
- They try to compensate the error also in adults (they can learn to change it) (little headphones on birds, there is a cute image with birds on headphones).
- They are actually able to change the song with that, they usually do avoidance feedback.
- You can make them learn using trial and error (using small annoying sounds if they are too low for example). (you can make them further and further away from the correct pitch)
- Also learn for visual feedback, but it is a different direction, they seem to like the light to be off, difference between deaf and hearing birds (these ones do not like the light to be off).
Maintakeaway:
- They take very little time to change the pitch.
- They take very little time to learn the same song again after the noise is removed.
- See next section, the same feedback can be given in a visual format.
Visual feedback
The question now is if birds can learn from visual feedback: Visual feedback by briefly switching off the light in the sound-isolation chamber of the singing bird whenever the pitch of a targeted syllable was below (or above) a threshold. This was enough of a change to make them change the pitch of the syllables.
Brain structures in song learning
In this section we study the neural level of song learning in Zebra finches.
HVC (High Vocal Center). DLM, medial portion of the dorsolateral thalamic nucleus; LMAN, lateral magnocellular nucleus of the anterior nidopallium; nXIIts, tracheosyringeal portion of the twelfth nerve; RA, robust nucleus of the arcopallium; Uva, nucleus uvaeformis; VTA/SNc; ventral tegmental area/substantial nigra pars compacta.
Song Pathways
Two pathways
- Anterior forebrain pathways
- And motor pathways
Using systems we studied in The Neuron (glass with thing). Some neurons spike at some specific points in time.
The motor pathway is necessary for song production and the anterior forebrain pathway (AFP) is necessary for song learning. Both of these pathways are important.
The AFP is a cortical-basal ganglia loop that consists of LMAN, basal ganglia homologue Area X, and the pallido-recipient thalamic nucleus DLM.
Note that RA, HVC, and LMAN are in the avian pallium, which is evolutionarily and developmentally related to mammalian cortex.
Bird’s Thalamus: DLM
-
Where it sits: It’s the thalamic relay nucleus within the bird song system, part of the anterior forebrain pathway (AFP), which is the basal ganglia–thalamocortical loop specialized for song learning.
-
Inputs:
- Gets inhibitory input from Area X (bird basal ganglia, analogous to striatum + pallidum).
-
Outputs:
- Projects excitatory output to LMAN (lateral magnocellular nucleus of the anterior nidopallium), which is like premotor cortex.
-
What it does functionally:
- Relays basal ganglia–processed signals back to cortex (LMAN).
- Maintains and transmits variability in song output (important for trial-and-error exploration during learning).
- Provides the thalamic link that allows reinforcement signals (dopamine-driven) to influence motor exploration.
In short, this is the zone that produces the signal to the cortex (LMAN) for learning.
Role of HVC(RA) neurons
HVC neurons can be antidromically identified by electrical stimulation in RA and area X. c, Example RA-projecting HVC neurons and putative interneurons that could be activated from RA but not from area X. Stimulation in RA, triggered by spontaneous spikes, results in spike collision for RA-projecting neurons but not for interneurons.
This allowed to discover the following:
- HVC(RA) neurons burst reliably at a single precise time in the song or call
- This is not the same as for the RA neurons they are projecting to.
- HVC interneurons spike or burst densely throughout these vocalizations.
Song learning is then simply the process of modulating the effective connectivity from sparsely active HVC neurons to downstream motor neurons such that, at each moment, the correct motor output is produced: RA serves as a switchboard on which HVC(RA) neurons are wired up to the correct motor output.
Song production of RA neurons
Birds have more neurons per weight (about 2.2 times per unit weight). They recorded neural firing signals during song production, they see some kinds of clustering (in young birds it is quite random, while for adult birds they get more structured, and stereotyped). The following images are from a section called RA neurons. They have some properties:
- They are sparse in activation, very precise in for the specific syllable that the bird is producing now.

HVC and RA for learning
HVC neurons spike only once (tells you when to play the note), while RA multiple times in a sequence (Tells you the note). If you remove HVC they cannot sing anymore, the subsong is the same, but plastic song is different and also the adult song is different.

the pattern of activity in RA, and thus the vocal output, is determined by the pattern of HVC(RA) -to-RA synapses; in this view, vocal learning is controlled by synaptic plasticity at these synapses. Song learning is then simply the process of modulating the effective connectivity from sparsely active HVC neurons to downstream motor neurons such that, at each moment, the correct motor output is produced: RA serves as a switchboard on which HVC(RA) neurons are wired up to the correct motor output.
Song-aligned firing patterns in RA neurons change gradually during learning. As the song becomes less variable and more similar to the tutor, the firing patterns in RA become more reproducible, sparse, and bursty.
HVC during Lesions

Singing in the absence of HVC is highly similar to normal subsong. (A) Distributions of syllable durations for three birds of various ages (blue) and distributions for the same birds in the absence of HVC (red). (B) Average syllable duration distributions for normal subsong-producing birds (blue) and birds of different ages in the absence of HVC
HVC is fundamental for producing song, if you don’t have it you produce songs very similar to the infants subsong.
Antidromic stimulation
Antidromic stimulation is a technique used for neuron identification in freely behaving animals. It allows to distinguish different types of projection neurons from each other and from interneurons via collision testing
Antidromic stimulation means you stimulate an axon at a point downstream from the cell body, and the action potential travels backwards (towards the soma) along the axon.
If your electrode is in HVC, and you stimulate RA, you can test whether the neuron you’re recording from in HVC actually projects to RA.
To prove it’s truly the same neuron (not a polysynaptic response), you use a collision test. If you trigger an orthodromic spike just before the antidromic stimulus, the two action potentials collide in the axon and cancel. The inability to evoke an antidromic spike within the refractory period (~2 ms) confirms the recorded neuron is indeed an HVC→RA projection cell.
Circuit Experiments
Thermal Cooling of HVC
HVC is for stereotypy and producing the songs. If you cool a brain area, you slow down the processes in that area. They observed that by cooling the HVC structure:
- The song gets longer. (slown down)
- Then they sing faster if you remove the cooling and you just learn with the cooling part. This insight has been used in human speech articulation and timing (motor area for speak -> not well comprensible, broca area -> Speak slower).
LMAN injects variability and bias
LMAN generates variability of the learnt songs. We can view this as the explorer node in the brain. They lesioned LMAN to have this observation.
-
Prevent context-dependent changes to song variability.
-
If you deaf it, the song becomes degraded.
-
If you lesion LMAN before deafness, the song is the same (no plasticity in learned songs).
- Juvenile birds with LMAN lesion cannot learn new songs anymore.
-
LMAN biases output to avoid punishment: if you deactivate it, it unlearns the just learned thing.
- Something similar has been seen with tetrotoxin injected to LMAN area.

**(A)** Schematic: Shows that when LMAN is active (gray dot), the bird’s pitch deviates from baseline toward the learned direction (up or down). When LMAN is silenced (red dot), the pitch shifts back closer to baseline.**(B)** Time series: For one bird, across training days, you see that the pitch with LMAN active moves away from baseline (as learning progresses), but when you inactivate LMAN, the pitch immediately drops back toward the baseline.**(C)** Scatter plot: Across all inactivation tests, the same pattern: **LMAN+ pitch** is consistently shifted toward the learned direction.**LMAN− pitch** is closer to baseline.
Area X lesions
If you lesion Area X, even adult birds cannot learn anymore (aversive behaviour we have seen with light for example), as we see in Conditioning Theory, we need dopamine neurons to integrate feedback from the environment. But the song variability that LMAN injects is still present.
LMAN’s variability is for plasticity
the AFP has been shown to actively inject variability into the motor pathway; LMAN lesions slightly reduce trial-by-trial variability of diverse spectral features even in adult birds. They also showed it has variable firing rates to RA zone.

Dopamine in Zebra Finches
Dopamine encodes the quality of sillables in some songbirds. This is as expected after studying Conditioning Theory, bad quality means low VTA activations.
VTA has some dopinanergic input.
- VTAx neurons reward prediction error firing (higher if it is better than expected, and lower if lower than expected).
- Optogenetic experiment to test the above hypothesis (experimentally manipulate the firing of some neurons.). See Birdsong and Song System.
- Basically it injects a virus and activates or disactivates to increase or decrease the firing.
- Exciting the area is enough for active seek of change (increase of pitch is ok).
- In the other case they seem to go away, which is consistent with the theory we have with
- VTx neurons encode relative syllable quality compared to a tutor’s song.
Delayed rewards are discounted.
Valence reversal caused by deafening
Hearing birds already have sensory feedback, but deaf birds have a nice feedback, and have impact on the environment. They can manipulate the environment to get sensory feedback. This why they sorta just playing, this is why it is not punishing anymore, and it does not affect the playfulness value. See Intrinsic Motivation and Playfulness, for an explanation based on empowerment theory.
VTA pathway
VTA is a dopaminergic region that projects to Area X, if birds hear the sound the dopamine is lower (worse vs better than expected), if it doesn’t happen then it is a little higher suggesting internal rewards.

Many dopaminergic neurons in the ventral tegmental area (VTA) project to Area X. Left: Strategy for antidromic identification of VTAx dopamine neurons. Top to bottom: spectrograms, spiking activity during undistorted and distorted trials, corresponding spike raster plots and rate histograms, and z-scored difference between undistorted and distorted rate histograms (plots aligned to target onset). Horizontal bars in histograms indicate significant deviations from baseline.
The motor system is attempting to predict its sensory feedbacks:
Dopamine Neurons Encode Syllable Quality

Left bottom: An example syllable (inside red box) on six different days as it is refined across development. Left top: Schematic showing computation of relative distance to adult syllable (circles, juvenile syllable renditions; black square, median location of adult (day > 90) renditions; arrows, distance to adult syllable). Relative distance was calculated as the difference between the current distance to adult syllable and the mean distance of the previous 14 renditions (see Methods). Right: f, ΔF/F signals averaged across all 25 syllables and days 61−100 are plo ed similarly to e, but for each decile (10% of syllable renditions) of relative distance
Optogenetics Manipulation of VTAX neurons
We can inhibit and excite using optogenetics (viral expression of the dopamine) to guide the quality of learning (aversive vs preference-based learning).
References
[1] Lipkind et al. “Songbirds Work around Computational Complexity by Learning Song Vocabulary Independently of Sequence” Nature Communications Vol. 8(1), pp. 1247 2017