Associative Conditioning
Classical Conditioning
First started with associative conditioning, which means cue -> reward signal and similars.
Pavlov’s experiment
Pavlov was interested in digestive systems of dogs, and discovered this by chance. Then he notices that if we show food to dog, they start to salivate (response to food stimulus). If paired with sound (tuning fork) they start to salivate even if they just hear the sound. He defines two states:
- Before conditioning
- During conditioning
- After conditioning state. Important words are conditioned stimulus, conditioned response. And their oppose (unconditioned). It is important that it is quite consistent.
Associate unconditioned stimulus with conditioned stimulus.

Importance of reliability
If the unconditioned stimulus is not well paired with the conditioned stimulus, then the conditioning is not well strenghtened.

If the correlation is reliable, the association is well learned. We will see what happens if there is a delay.
Rescorla-Wagner model
The expected reward is modeled as a linear combination of the expected reward: $V = wu$ where u is an aleatoric variable that indicates whether the conditioned response is present, and $w$ is the strength of the presence (initially 0).
Error is $\delta = R - V$ and you update it using something similar to The Perceptron Model.
Phenomena for RW
It already explains many phenomenons:
- Blocking, second tones are not going to be learned, because the first is enough to explain the prediction error (they are both still present).
- Acquisition: associative strength increases quickly at first, then levels off.
- Extinction: if the US stops following the CS, associative strength decreases.
Time Conditioning
Experiment with flies and electric shock and odor.
- After the shock, it becomes a palliative, they like it and approach it.
- Before or shortly after the shock, they don’t like it, and start to associate it. The valence is reversed. (e.g. if its opposite, avoidance behaviour, then it signals palliative measures.)
Conditioned galvanic response? Prof. thinks it is innate but difficult to test.
Generalization of galvanic response to electric shock
The study titled “The Irradiation of a Tactile Conditioned Reflex in Man” by Milton J. Bass and Clark L. Hull, published in 1934 in the Journal of Comparative Psychology, investigates the phenomenon of irradiation in conditioned reflexes. Irradiation refers to the spread or generalization of a conditioned response from the original stimulus to other stimuli that are similar but not identical.
- Generalization of Reflexes: The study demonstrated that a conditioned reflex, initially established to a specific tactile stimulus, could spread to other tactile stimuli that were not directly associated with the original conditioning.
- Influence of Stimulus Similarity: The extent of irradiation was found to be influenced by the similarity between the conditioned stimulus and the new, untrained stimuli. The more similar the new stimulus was to the original, the greater the likelihood of a conditioned response.
- Implications for Learning and Behavior: These findings suggest that the process of learning and the establishment of conditioned reflexes are not limited to specific, isolated stimuli but can involve a broader range of stimuli, leading to generalized responses. This research contributes to our understanding of how conditioned behaviors can generalize beyond the original learning context, highlighting the adaptability and complexity of human learning processes.
Operant Conditioning
Operant meaning we want to teach behaviours.
History of Operant Conditioning
They associate actions with rewards or bad things. Classical example is the one with mice.
Operant conditioning has been first described by Edgar Thorndike (1874-1949) but has been studied systematically by B.F. Skinner and others later. In the typical experimental setup, an animal (for example a rat or a pigeon) is isolated in a box and is rewarded or punished for a specific action. This box is sometimes referred to as Skinner box because B.F. Skinner first used this experimental setup and is often referred to as the father of operant conditioning.
Vocabulary of Operant Conditioning
We set some vocabulary, which has been some confusion around these years.
- Reinforcement (increase behaviour)
- Negative (remove aversive stimulus (escape)), or avoidance, some behaviour that avoids this aversive stimulus (avoidance)).
- Positive reinforcement (reward, e.g. food)
- Punishment (decrease behaviour)
- Positive punishment (add aversive stimulus, e.g. shock)
- Negative punishment: remove something appetitive
Ambiguity of The Vocabulary
The above has been criticized often. It is not clear if one behaviour was for example negative reinforcement or positive punishment, often these two are interchangeable. For example in heat lamp (Baron & Galizio 2005), do they learn to escape the aversive cold (negative reinforcement) or do they learn to obtain the appetitive heat from the lamp (positive reinforcement)?
Dopaminergic neurons
Neuromodulators are produced in specific places and act on the whole brain.
Neuromodulator’s pathways
There are some parts that are produced in specific parts of the brain, but then they influence the whole brain
Syntesis pathways
- Dopamine -> three steps by tyrocine amminoacid
- Norepinephrine -> four steps from tyrocine, directly with dopamine (highly influenced by dopamine, but slightly different with that), act in similar area and for similar tasks.

Dopaminergic System
Dopaminergic system: substantia nigra and ventral tegmental area (VTA). Dopamine is synthesized in three steps from the amino acid tyrosine. It is associated with reward mechanisms in the brain and generally involved in regulatory motor activity, in mood, motivation and attention.
Serotonergic System
Serotonergic system: rostral raphe nuclei (labelled on the figure), caudal raphe nuclei (not labelled, but highlighted in green in the figure). Serotonin (5-HT) is synthesized in two steps from the amino acid tryptophan. It regulates attention and other complex cognitive functions, such as sleep (dreaming), eating, mood, and pain. Too little serotonin has been shown to lead to depression, anger control etc.
Noradrenergic System
Noradrenergic system: locus coeruleus. Also other nuclei with noradrenergic neurons: caudal ventrolateral part of medulla (controls body fluid metabolism), solitary nucleus (control of food intake and responses to stress). Norepinephrine (noradrenaline) is synthesized directly from dopamine (is synthesized in four steps from tyrosine) within vesicles. Norepinephrine is associated with bringing our nervous systems into “high alert“, it increases our heart rate and our blood pressure, it is also important for forming memories.
Cholinergic System
Cholinergic system: nucleus basalis, medial septal nucleus and nucleus of diagonal band, pedunculopontine nucleus and laterodorsal-2-tegmental nucleus. Acetylcholine is responsible for stimulation of muscles, including the muscles of the gastro-intestinal system. It is used everywhere in the brain.
All four major neuromodulators tend to be excitatory in their effect.
General Actions of Neuromodulators
- The global presence or absence of a neuromodulator is equivalent to a specific behavioral state.
- However, this view appears to contradict studies at the cellular level which show that multiple neuromodulators can act simultaneously on any single neuron, that intrinsic excitability and synaptic efficacy are always under neuromodulatory influence
Reconfiguration of neural circuits by neuromodulators is an intricately balanced process that involves multiple synergistic or antagonistic pathways.
Neuromodulators’ influence
- Modulated -> facilitation or inversion of neuron activations (for example instead of depression, it can have opposing synaptic strength).
- Effect on the brain plasticity of the neuro-modulators (some kind of push pull behaviour).
- PLC helps for LTD.
- PKA helps for LTP.
- Influence ion channel excitability.
- For example firing pattern and action potential strength could change under the effect of neuromodulators action.
- Difference neuro-modulators can have same effect
- Linear effect (makes firing rate higher)
- Non linear (changes firing pattern, like bursting).
Theorist need to explain many of these phenomena, which is still an open field.

Dopamine
The Rescorla-Wagner is a nice model for this kind of predictions and rewards.
Dopamine systems encode timing of the Reward

Temporal sensitivity of prediction error response of dopamine neuron. From top to bottom: Reward delay by 0.5 s leads to considerable depression at the habitual time of reward and activation at the new time. Earlier reward leads to activation at new time but not to major depression at the habitual time. The habitual time of reward is at 1.0 s after touch of an operant key and simultaneous offset of conditioned stimulus (CS). Reward delivery is marked by a longer line; the slight jitter reflects the fact that the interval between the lever press (to which the traces are aligned) and the reward delivery was not controlled with absolute precision (timing varies ± 8 ms). ‘CS on’ indicates the various times of appearance of pictures, as indicated by small vertical line in each raster. In each panel, original trial sequence is from top to bottom.
Role of RW model for dopamine action
They model both when and occurrence!
- NO-stimulus -> reward, makes neuron fire.
- After conditioning, the dopamine neurons start to fire after the conditioned stimulus.
- If no reward, there is a drop of the dopamine neurons (less reward), nicely explained by the rescorla-wagner model.
- Reward for learning, when learned no firing animore, that is nice. (also predicts its interval). With learning, the dopamine fires less and less, in this sense it learns the reward and the interval. This is important difference with reals systems, it predicts the interval when the reward is coming.

. The dopamine neuron is activated by this unpredicted occurrence of rewards (juice in this experiment)
We know it encodes time, because we have observed the activations when its given later or earlier (or absent).
Magnitude of the Reward
Strength of the activation depends on the quantity of the reward.
Probability of predicted reward
More reward -> higher response. Smaller probability of prediction means lower activation (Rescorla ok!)
Difference between rewarded stimulus and unrewarded stimulus.
The reward activation is related to the probability of the correlation between stimulus and reward.
If clear correlation, then reward is just the at the stimulus, lower correlation means lower reward at the stimulus and higher at the reward.
Predicting long rewards
This is the idea behind discounted reward. Activates less if the reward is far in the future. Time component is not present in the RW model.

Pathways of reward-based and avoidance learning
We have two dopamine pathways:
- Direct: D1 receptor dopamine for increases of dopamine
- Indirect D2 is for low threshold (when the dopamine is below a threshold) It is believed the first one is to learn to act in some ways, while the second is to avoid some ways.

Relation with Parkinson’s Disease
People with Parkinson’s disease cannot learn much from trial and error. Ldopa increases dopamine (they learn to choose with medicine, but without it only aversive stimulus), and diminishes dips in the dopamine. Parkinson people only learn to avoid, when they have Ldopa they only learn positive trials and lose the ability to avoid.
- Dopamine helps learn active reward seeking.
- Absence of dopamine prevents avoiding behaviours
We have seen similar behaviour for experiments done on mices.
Role of the striatum
We have cited striatum for the basal ganglia-dopamine system, briefly in Architecture of the Brain.
- Role: Gating of actions — it decides which action gets facilitated vs suppressed.
- Two major pathways:
- Direct pathway (Go): facilitates actions.
- Indirect pathway (No-Go): suppresses competing actions.
- Dopamine from the VTA biases these pathways:
- D1 receptors (direct) → strengthened by dopamine → “Do it again.”
- D2 receptors (indirect) → weakened by dopamine → “Don’t stop this.”
Dopaminergic Pathways
The striatum is one important connection part that integrates the feedback information given by the dopaminergic neurons.

We have Cortex -> Basal Ganglia (Striatum and Pallidum) -> Thalamus for conscious learning.
After repetition the main pathways is BG -> Thalamus, without conscious cortex processing anymore.