|Wolfram Schultz (2007), Scholarpedia, 2(6):2184.||doi:10.4249/scholarpedia.2184||revision #145291 [link to/cite this article]|
Reward information is processed by specific neurons in specific brain structures. Reward neurons produce internal reward signals and use them for influencing brain activity that controls our actions, decisions and choices.
A prime goal in the investigation of neural processes of reward is to identify an explicit neuronal reward signal, just as retinal responses to visual stimuli constitute starting points for investigating the neuronal processes underlying visual perception. The search for a "retina of the reward system" has located brain signals related purely to reward value irrespective of sensory and motor attributes in midbrain dopamine neurons and in select neurons of orbitofrontal cortex, dorsal and ventral striatum, and possibly amygdala. Reward signals influence neural processes in cortical and subcortical structures underlying behavioral actions and thereby contribute to economic choices.
Pure Reward Signals in Dopamine Neurons
Midbrain dopamine neurons show phasic excitatory responses (activations) following primary food and liquid rewards, and visual, auditory and somatosensory reward-predicting stimuli. As in sensory systems, the reward-related activation can be preceded by a brief detection component before the stimulus has been identified and properly valued. The reward-related activations occur in 65-80% of dopamine neurons in cell groups A9 (pars compacta of substantia nigra), A10 (ventral tegmental area, VTA) and A8 (dorsolateral substantia nigra). The activations have latencies of < 100 ms and durations of < 200 ms. The same neurons are briefly depressed in their activity by reward omission and by stimuli predicting the absence of reward; they are not affected by known neutral stimuli unless they have substantial intensity ( Figure 1). The particular characteristics of these phasic dopamine responses are compatible with the notion of teaching signal according to reinforcement learning theories, as further described below. Dopamine neurons in groups A8-A10 project their axons to the dorsal and ventral striatum, dorsolateral and orbital prefrontal cortex and some other cortical and subcortical structures. The subsecond dopamine reward response may be responsible for the reward-induced dopamine release seen with voltammetry (Roitman et al. 2004) but would not easily explain the 300-9,000 times slower dopamine fluctuations with rewards and punishers seen in microdialysis (Datla et al. 2002, Young 2004).
Reward prediction error
The dopamine reward response appears to code the discrepancy between the reward and its prediction (‘prediction error’), such that an unpredicted reward elicits an activation (positive prediction error), a fully predicted reward elicits no response, and the omission of a predicted reward induces a depression (negative error, Figure 1).
The hypothesis that dopamine neurons report reward prediction errors can be tested formally by paradigms developed by animal learning theory, using the Rescorla-Wagner learning rule. In the blocking paradigm (Fig. 3a), a stimulus is not learned when it is paired with an already fully predicted reward, indicating the importance of prediction errors for learning. After pairing with a fully predicted reward, the blocked stimulus does not come to predict a reward. Accordingly, the absence of a reward following the blocked stimulus does not produce a response in dopamine neurons, as no prediction error is elicited, and the delivery of a reward does produce a positive prediction error response ( Figure 3a left). By contrast, after a well trained reward-predicting stimulus, reward omission produces a depressant neural response, and reward delivery does not lead to a response in the same dopamine neuron ( Figure 3a right).
In the conditioned inhibition paradigm (Fig. 3b), a test stimulus is presented simultaneously with an established reward-predicting stimulus but no reward is given after the compound, making the test stimulus a conditioned inhibitor which predicts the absence of reward. Reward omission after a conditioned inhibitor does not produce a prediction error response in dopamine neurons, even when the established reward-predicting stimulus is added ( Figure 3b left). By contrast, the occurrence of reward after the inhibitor produces an enhanced prediction error response, as the prediction error represents the difference between the actual reward and the negative prediction from the inhibitor ( Figure 3b left bottom). By contrast, following a neutral control stimulus there is no depression when no reward occurs, there is the usual depression with reward omission when another, otherwise reward-predicting stimulus is added, and there is the usual activation with surprising reward ( Figure 3b right). Taken together, the data from these paradigms suggest that dopamine neurons show bidirectional coding of reward prediction errors, following the equation
- DopamineResponse = RewardOccurred – RewardPredicted.
Thus the dopamine response seems to convey the crucial learning term \((\lambda-V)\) of the Rescorla-Wagner learning rule and complies with the principal characteristics of teaching signals of efficient reinforcement models (Sutton & Barto 1998).
The response to unpredicted primary reward varies in a monotonic positive fashion with reward magnitude ( Figure 3a). The positive and negative reward prediction error response is also graded, such that a partial prediction error induces a smaller error response. Prediction errors covary with reward probability ( Figure 3b, c) and reflect the discrepancy of the experienced and predicted reward or, more precisely, the difference between the mean of the probability distribution of received reward magnitudes and the expected value of the predicted distribution (Fiorillo et al. 2003, Satoh et al. 2003, Morris et al. 2004, Nakahara et al. 2004, Bayer & Glimcher 2005, Pan et al. 2005).
The reward prediction error response appears to normalize to the standard deviation of the prediction error provided that appropriate advance information is available. When three visual stimuli predict different binary distributions of equiprobable reward magnitudes, the larger magnitude always elicits the same positive prediction error-related activation, even with a 10-fold difference in prediction error ( Figure 1a, b), although the same neurons are sensitive to unpredicted magnitudes ( Figure 3a). As a result of this gain adaptation, the neural response discriminates between the two likely outcomes equally well, regardless of their absolute magnitude difference.
The prediction error response is sensitive to both the occurrence and the time of the reward, as a delayed reward induces a depression at its original time and an activation at its new time ( Figure 6)
Neuronal computations using prediction errors may contribute to the self-organization of behavior ( Figure 2). Brain mechanisms establish predictions, compare current inputs with predictions from previous experience, and emit a prediction error signal once a mismatch is detected. The error signal may act as an impulse for synaptic modifications that lead to subsequent changes in predictions and behavioral reactions. The process is reiterated until behavioral outcomes match the predictions and the prediction error becomes nil. In the absence of a prediction error, there would be no signal for modifying synapses, and synaptic transmission remains unchanged and stable.
Dopamine neurons acquire responses to reward-predicting visual and auditory conditioned stimuli (CS). The responses covary with the expected value of reward, irrespective of spatial position, sensory stimulus attributes and arm, mouth and eye movements ( Figure 6). The responses are modulated by the motivation of the animal, the time course of predictions and the animal’s choice among rewards (Satoh et al. 2003, Nakahara et al. 2004, Morris et al. 2006). Although discriminating between reward-predicting CSs and neutral stimuli, dopamine activations have a non-negligible propensity for generalization (Waelti et al. 2001).
During the course of learning, the dopamine response to the reward decreases gradually, and a response to the immediately preceding CS develops in parallel. The gradual, opposite changes in US and CS responses do not involve backpropagating waves of prediction error (Pan et al 2005) assumed in earlier reinforcement models (Montague et al. 1996, Schultz et al. 1997) and are modelled in a biologically plausible manner as teaching signals for behavioral tasks, including Pavlovian conditioning, spatial delayed responding and sequential movements (Suri & Schultz 1999; Izhikevich 2007). These changes are compatible with Pavlovian response transfer and basic principles of temporal difference learning (TD) and suggest the presence of eligibility traces as an essential feature of reward learning.
Activations do not occur when the CS is predicted within a few seconds by another well trained stimulus. This observation conforms to basic assumptions of TD models. As it is often difficult to determine whether rewards are 'primary' or conditioned (Wise 2002), TD models do not make this distinction and assume that CSs can act as reinforcers and elicit prediction errors just as rewards do (Sutton & Barto 1998). Accordingly a dopamine CS response would reflect an error in the prediction of this CS (Suri & Schultz 1999).
Physically intense, unrewarded stimuli induce a short, initial activation in dopamine neurons (Fiorillo et al. 2013b), which is enhanced by stimulus novelty (Schultz & Romo 1987, Horvitz et al. 1997, Ljungberg et al. 1992), generalisation to physically similar rewarded stimuli (Mirenowicz & Schultz 1996, Waelti et al. 2001, Tobler et al. 2003), and reward context (Kobayashi & Schultz 2014). This initial component reflects the detection of the stimulus before identification of its properties and reward value (Nomoto et al. 2010, Fiorillo et al. 2013b). Its intensity is graded across the ventral midbrain without clear category boundaries (Fiorillo et al. 2013a). The initial component occurs sometimes also with aversive stimuli, such as air puffs, aversive liquids or footshocks (Mirenowicz & Schultz 1996, Brischoux et al. 2009, Matsumoto & Hikosaka 2009), but careful controls reveal relationships to physical rather than aversive stimulus properties (Fiorillo et al. 2013b). Thus, the activations of some dopamine neurons by noxious stimuli do not seem to reflect aversive but physical stimulus properties. The more common dopamine response to aversive stimuli is depression of activity. Thus, the dopamine activation consists of an early component reflecting stimulus detection, and a subsequent component coding reward prediction error.
Risk Signal in Dopamine Neurons
Rewards occur in most natural situations with some degree of uncertainty. The uncertainty of reward can be tested as risk using different well-trained probabilities for the all-or-none delivery of reward and allows researchers to separate expected reward value (linearly increasing from p=0 to p=1) from risk expressed as variance, standard deviation (SD) or entropy of the probability distribution of magnitudes (inverted U function with peak at p=0.5). More than one third of dopamine neurons show a relatively slow, sustained and moderate activation between the reward-predicting stimulus and the reward which covaries with the degree of risk ( Figure 2). This activation occurs in individual trials and does not propagate from reward back to the conditioned stimulus during learning, as assumed by some implementations of temporal difference reinforcement models (Schultz et al. 1997). The risk-related, more sustained activation ( Figure 2 right) contrasts with the more phasic response to reward-predicting stimuli covarying with expected value (left), and the two responses are uncorrelated in strength in individual neurons. When varying the variance (and SD) of the magnitudes of two equiprobable rewards while keeping entropy constant at 1 bit, the sustained activation increases monotonically with variance (or SD). Thus, variance (or SD) is an effective measure of risk for dopamine neurons.
The distinct neural coding of reward value and uncertainty is consistent with the separation of expected utility into these two components suggested by the mean-variance approach in Financial Decision Theory (Huang & Litzenberger 1988) and found in human brain imaging (Preuschoff et al. 2006; Tobler et al. 2007). These activations do not rule out that other brain structures may code utility as single (scalar) variable proposed by classic Expected Utility Theory (Von Neumann & Morgenstern 1944).
Pure Reward Signals in other brain areas
Neuronal activity in orbitofrontal cortex is substantially influenced by rewards. The neurons show activations following reward-predicting stimuli, during the expectation of reward and after reward reception.
Orbitofrontal responses to rewards and reward-predicting stimuli are related to the motivational value rather than the more sensory properties of reward objects, as satiation with specific rewards reduces the neuronal responses (Critchley and Rolls 1996). They constitute pure reward signals by reflecting only reward and not spatial or visual object features ( Figure 5). Orbitofrontal reward signals distinguish between reward and punishment (Thorpe et al. 1983), change with reversal of stimulus-reward associations (Rolls et al. 1996), discriminate between different volumes of liquid reward and encode the economic value of rewards for decision-making irrespective of the actual reward objects (Padoa-Schioppa & Assad 2006). Different neurons in this structure show more sustained activations preceding the expected delivery of liquid or food reward (Schoenbaum et al. 1998, Tremblay & Schultz 1999, Hikosaka and Watanabe 2000). Besides these pure reward-related responses, a few other orbitofrontal neurons respond to visual object properties or are activated in relation to movements.
Orbitofrontal neurons do not appear to be specialized for particular reward objects but seem to discriminate between different rewards depending on their current availability ( Figure 11). An reward that is effective for activating an orbitofrontal neuron (apple in Figure 11 top) may lose its efficacy when the reward distribution changes and the initially effective reward loses its highest preference (bottom). By encoding economic value rather than specific reward objects, these responses appear to adapt to the current probability distribution of reward values. A change in this distribution changes the neuronal responses. The apparent dependence of responsiveness on a set point corresponds to a basic tenet of Prospect Theory indicating that outcomes are valued relative to movable references rather than absolute physical characteristics (Kahneman & Tversky 1984).
Striatum and nucleus accumbens
The slowly firing medium spiny neurons in striatum and nucleus accumbens and the tonically active striatal neurons (TANs) respond to the reception of food and liquid rewards (Apicella et al. 1991a, b). Other striatal and accumbal neurons show phasic activations following visual reward-predicting stimuli and more sustained activations during the expectation of rewards (Hikosaka et al. 1989a, b, Schultz et al. 1992). Changes of existing reward expectation during learning lead to adaptations of reward expectation-related activity to the currently valid expectation in parallel with the animal’s behavioral differentiation. The TANs discriminate between rewards and air puff punishers (Ravel et al. 2003), and many slowly firing striatal neurons distinguish reward from no reward and discriminate between different rewards and reward magnitudes irrespective of visual object properties, spatial information and movements ( Figure 12; Bowman et al. 1996). Neurons in the ventral striatum show a higher incidence of reward responses and reward expectation activities, as compared to caudate and putamen neurons with their larger spectrum of task-related activity. Thus subpopulations of striatal neurons appear to process pure reward signals.
Reward Influences on Action-Related Activity
Dorsolateral prefrontal cortex
In addition to generating specific signals, rewards influence also on-going action-related activity. The prediction of different food or liquid rewards modifies the typical, spatially discriminating delay activity of neurons in dorsolateral prefrontal cortex ( Figure 13; Kobayashi et al. 2002) and influences movement specific cue responses in medial prefrontal cortex (Matsumoto et al. 2003). These prefrontal neurons carry signals related to the preparation of movement and at the same time encode the expected reward. Only a small population of prefrontal neurons is activated by aversive stimuli (Kobayashi et al. 2006).
Other cortical areas
Predicted rewards influence arm and eye movement-related activity also in other cortical areas including parietal cortex (Platt & Glimcher 1999, Mussalam et al. 2004) and anterior and posterior cingulate (Shidara & Richmond 2002, McCoy et al. 2003). Similar reward effects in premotor cortex may reflect the motivating functions of rewards on movements coded in this part of the motor system (Roesch & Olson 2003).
Similar to prefrontal neurons, the action-related activity of a population of neurons in the striatum (caudate and putamen) is influenced by predicted rewards. These neurons are activated during the preparation and execution of specific arm and eye movements towards different spatial targets and discriminate between movement and non-movement reactions. At the same time these specific action-related activities are differentially influenced by the predicted presence vs. absence of reward ( Figure 14; Kawagoe et al. 1998) and by different predicted types, magnitudes and probabilities of reward (Hassani et al. 2001, Cromwell & Schultz 2003). This activity can predict the animal’s choice toward a rewarding outcome (Samejima et al. 2005). Similar action-reward specific activities exist also in the subthalamic nucleus (Sato & Hikosaka 2002).
The activations in the striatum and cortex mentioned above do not simply represent outcome expectations, as they differentiate in addition between different behavioral reactions for the same outcome ( Figure 14 movement vs. nonmovement), and they do not only reflect different behavioral reactions, as they differentiate also between the expected outcomes ( Figure 14 top vs. bottom). The reward-differentiating nature of the activations develop and adapt during learning while differential reward expectations are being acquired ( Figure 5).
The combined action and reward coding by striatal neurons complies with theoretical notions of associating specific behavioral actions with rewarding outcomes through operant learning (Sutton & Barto 1998). These activities may constitute neuronal correlates of goal-directed behavior, as they appear to reflect neuronal representations of the reward for the specific action while this action is being prepared and executed (Dickinson & Balleine 1994).
The combined coding of action and reward contrasts with the earlier described pure reward signals in dopamine neurons and in some neurons of orbitofrontal cortex and striatum, which reflect the predicted or received reward irrespective of other stimulus or behavioral components. In demonstrating the influence of predicted reward on action-related activity ( Figure 16), action-outcome coding may represent the next processing stage downstream from pure reward signals towards overt choices. Action-outcome coding may be a component of mechanisms by which reward signals are translated into behavioral choices for obtaining reward through action. Information about the value of each possible action in choice situations would constitute important inputs for decision-making mechanisms.
Reward-Related Activity in Amygdala
Reward-predicting stimuli and unpredicted liquid, food, cocaine and intracranial electrical stimulation elicit responses in central and basolateral amygdala, the responses being differentiated against aversive and neutral outcomes (Paton et al. 2006). Responses discriminate between reward magnitudes and change with outcome reversal. Responses correlate with orbitofrontal responses during early discrimination learning and decrease after orbitofrontal lesions (Pratt & Mizumori 1998, Schoenbaum et al. 1998, 2000, Toyomizu et al. 2002, Carelli et al. 2003, Saddoris et al. 2005). Amygdala neurons show satiety-sensitive gustatory responses and responses to liquid or food-predicting visual stimuli differentiated from air puffs which decrease with increasing behavioral requirements (Nishijo et al. 1988, Yan & Scott 1996, Wilson & Rolls 2005, Paton et al. 2006, Sugase-Miyamoto & Richmond 2005).
Overview of neuronal reward signals
|Selected pure reward signals in monkeys (irrespective of sensory and motor aspects)|
|Brain structure||Specific characteristics||References|
|Dopamine neurons||reward prediction||Romo & Schultz 1990|
|Kawagoe et al. 2004|
|Morris et al. 2006|
|prediction error||Schultz et al. 1997|
|Morris et al. 2004|
|Bayer & Glimcher 2005|
|temporal prediction error||Hollerman & Schultz 1998|
|Nakahara et al. 2004|
|adaptive value coding||Tobler et al. 2005|
|motivation||Satoh et al. 2003|
|Orbitofrontal cortex||satiation sensitivity||Critchley & Rolls 1996|
|reward expectation||Tremblay & Schultz 1999|
|adaptive coding||Tremblay & Schultz 1999|
|economic value||Padoa-Schioppa & Assad 2006|
|reversal learning||Rolls et al. 1996|
|novel learning||Tremblay & Schultz 2000|
|Anterior cingulate cortex||reward expectation||Shidara & Richmond 2002|
|Striatum||reward expectation||Hikosaka et al. 1989b|
|Hollerman et al. 1998|
|reward type||Shidara et al. 1996|
|Hassani et al. 2001|
|Amygdala||reward prediction||Sugase-Miyamoto & Richmond 2005|
|Paton et al. 2006|
|Selected action-related value signals in monkeys (conjoint reward and motor aspects)|
|Prefrontal cortex||spatial-reward||Watanabe 1996|
|Kobayashi et al. 2002|
|go-nogo-reward||Matsumoto et al. 2003|
|Premotor cortex||spatial-reward w/motivation||Roesch & Olson 2003|
|Posterior cingulate cortex||spatial-reward||McCoy et al. 2003|
|Parietal cortex||spatial-reward value||Platt & Glimcher 1999|
|Musallam et al. 2004|
|Striatum||go-nogo-reward||Hollerman et al. 1998|
|spatial-reward type||Hassani et al. 2001|
|spatial-reward magnitude||Kawagoe et al. 1998|
|Cromwell et al. 2003|
|spatial-reward probability||Samejima et al. 2005|
|spatial-reward adaptive coding||Cromwell et al. 2005|
|spatial reversal learning||Pasupathy & Miller 2005|
|go-nogo novel learning||Tremblay et al. 1998|
|Globus pallidus||spatial-reward||Arkadir et al. 2004|
|Substantia nigra reticulata||spatial-reward||Sato & Hikosaka 2002|
The author acknowledges support by the Wellcome Trust, Swiss National Science Foundation, Human Frontiers Science Program and several other grant and fellowship agencies.
- Apicella P, Scarnati E, Schultz W. Tonically discharging neurons of monkey striatum respond to preparatory and rewarding stimuli. Exp. Brain Res. 84: 672-675, 1991a
- Apicella P, Ljungberg T, Scarnati E, Schultz W. Responses to reward in monkey dorsal and ventral striatum. Exp. Brain Res. 85: 491-500, 1991b
- Arkadir D, Morris G, Vaadia E, Bergman H. Independent coding of movement direction annd reward prediction by single pallidal neurons. J. Neurosci. 24: 10047-10056, 2004
- Bayer HM, Glimcher PW: Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47: 129-141, 2005
- Bowman EM, Aigner TG, Richmond BJ. Neural signals in the monkey ventral striatum related to motivation for juice and cocaine rewards. J. Neurophysiol. 75: 1061-1073, 1996
- Brischoux F, Chakraborty S, Brierley DI, Ungless MA. Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli. Proc Natl Acad Sci USA 106: 4894–4899, 2009
- Carelli RM, Williams JG, Hollander JA: Basolateral amygdala neurons encode cocaine self-administration and cocaine-associated cues. J Neurosci 23: 8204-8211, 2003
- Critchley HG, Rolls ET. Hunger and satiety modify the responses of olfactory and visual neurons in the primate orbitofrontal cortex. J. Neurophysiol. 75: 1673-1686, 1996
- Cromwell HC, Hassani OK, Schultz W. Relative reward processing in primate striatum. Exp Brain Res. 162: 520-525, 2005
- Cromwell HC, Schultz W. Effects of expectations for different reward magnitudes on neuronal activity in primate striatum. J. Neurophysiol. 89 2823-2838, 2003
- Datla KP, Ahier RG, Young AMJ, Gray JA, Joseph MH. Conditioned appetitive stimulus increases extracellular dopamine in the nucleus accumbens of the rat. Eur J Neurosci, 16, 1987-1993, 2002
- Dickinson A, Balleine B. Motivational control of goal-directed action. Anim. Learn. Behav. 22: 1-18, 1994
- Fiorillo CD, Song MR, Yun SR. Diversity and homogeneity in responses of midbrain dopamine neurons. J Neurosci 33: 4693–4723, 2013a
- Fiorillo CD, Song MR, Yun SR. Multiphasic temporal dynamics in responses of midbrain dopamine neurons to appetitive and aversive stimuli. J Neurosci 33: 4710–4725, 2013b
- Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299: 1898-1902, 2003
- Hassani OK, Cromwell HC, Schultz W. Influence of expectation of different rewards on behavior-related neuronal activity in the striatum. J. Neurophysiol. 85: 2477-2489, 2001
- Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons. II. Visual and auditory responses. J. Neurophysiol. 61: 799-813, 1989a
- Hikosaka O, Sakamoto M, Usui S. Functional properties of monkey caudate neurons. III. Activities related to expectation of target and reward. J. Neurophysiol. 61: 814-832, 1989b
- Hikosaka K, Watanabe M. Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. Cerebral Cortex 10: 263-271, 2000
- Hollerman JR, Schultz W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neurosci. 1: 304-309, 1998
- Hollerman JR, Tremblay L, Schultz W. Influence of reward expectation on behavior-related neuronal activity in primate striatum. J. Neurophysiol. 80: 947-963, 1998
- Horvitz JC, Stewart T, Jacobs BL. Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat. Brain Res. 759: 251-258, 1997
- Huang C-F and Litzenberger RH: Foundations for Financial Economics. Prentice-Hall, Upper Saddle River, NJ 1988
- Izhikevich EM. Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral Cortex 17: 2443-2452, 2007
- Kahneman D, Tversky A. Choices, Values, and Frames. American Psychologist 4, 341-350, 1984
- Kawagoe R, Takikawa Y, Hikosaka O. Expectation of reward modulates cognitive signals in the basal ganglia. Nature Neurosci. 1: 411-416, 1998
- Kawagoe R, Takikawa Y, Hikosaka O. Reward-predicting activity of dopamine and caudate neurons - a possible mechanism of motivational control of saccadic eye movement. J. Neurophysiol. 91: 1013-1024, 2004
- Kobayashi S, Lauwereyns J, Koizumi M, Sakagami, M, Hikosaka O: Influence of reward expectation on visuospatial processing in macaque lateral prefrontal cortex. J. Neurophysiol. 2002, 87: 1488-1498, 2002
- Kobayashi S, Nomoto K, Watanabe M, Hikosaka O, Schultz W, Sakagami M: Influences of rewarding and aversive outcomes on activity in macaque lateral prefrontal cortex. Neuron 51: 861-870, 2006
- Kobayashi S, Schultz W. Reward contexts extend dopamine signals to unrewarded stimuli. Curr Biol 24: 56-62, 2014
- Ljungberg T, Apicella P, Schultz W. Responses of monkey dopamine neurons during learning of behavioral reactions. J. Neurophysiol. 67: 145-163, 1992
- Matsumoto, M., Hikosaka, O. Two types of dopamine neuron distinctively convey positive and negative motivational signals. Nature 459: 837-841, 2009
- Matsumoto K, Suzuki W, Tanaka K. Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301: 229-232, 2003
- McCoy AN, Crowley,JC, Haghighian G, Dean HL, Platt ML. Saccade reward signals in posterior cingulate cortex. Neuron 40, 1031–1040, 2003
- Mirenowicz J, Schultz W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379: 449-451, 1996
- Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16: 1936-1947, 1996
- Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43: 133-143, 2004
- Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H. Midbrain dopamine neurons encode decisions for future action. Nature Neurosci. 9: 1057-1063, 2006
- Musallam S, Corneil, BD, Greger B, Scherberger H and Andersen RA: Cognitive control signals for neural prosthetics. Science 305: 258-262, 2004
- Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O. Dopamine neurons can represent context-dependent prediction error. Neuron 41: 269-280, 2004
- Nishijo H, Ono T, Nishino H: Single neuron responses in amygdala of alert monkey during complex sensory stimulation with affective significance. J. Neurosci. 8: 3570-3583, 1988
- Nomoto K, Schultz W, Watanabe T, Sakagami M. Temporally extended dopamine response to perceptually demanding reward-predictive stimuli. J Neurosci 30: 10692-10702, 2010
- Padoa-Schioppa C, Assad JA. Neurons in the orbitofrontal cortex encode economic value. Nature 441: 223-226, 2006
- Pan WX, Schmidt R, Wickens J, Hyland BI. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25:6235-6242,2005
- Pasupathy A, Miller EK. Different time courses of learning-related activity in the prefrontal cortex and striaum. Nature 433: 873-876, 2005
- Paton JJ, Belova MA, Morrison SE, Salzman CD: The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature 439: 865-870, 2006
- Platt ML, Glimcher PW. Neural correlates of decision variables in parietal cortex. Nature 400: 233-238, 1999
- Pratt WE, Mizumori SJY: Characteristics of basolateral amygdala neuronal firing on a spatial memory task involving differential reward. Behav. Neurosci. 112: 554-570, 1998
- Preuschoff K, Bossaerts P, Quartz SR. 2006. Neural differentiation of expected reward and risk in human subcortical structures. Neuron 51: 381–90
- Ravel S, Legallet E, Apicella P. Responses of tonically active neurons in the monkey striatum discriminate between motivationally opposing stimuli. J. Neurosci. 23: 8489-8497, 2003
- Roesch MR, Olson, C.R.: Impact of expected reward on neuronal activity in prefrontal cortex, frontal and supplementary eye fields and premotor cortex. J. Neurophysiol. 90: 1766-1789, 2003
- Roitman MF, Stuber GD, Phillips PEM, Wightman RM, Carelli RM. 2004. Dopamine operates as a subsecond modulator of food seeking. J. Neurosci. 24: :1265–71
- Rolls ET, Critchley HD, Mason R, Wakeman EA. Orbitofrontal cortex neurons: role in olfactory and visual association learning. J. Neurophysiol. 75: 1970-1981, 1996
- Romo R, Schultz W. Dopamine neurons of the monkey midbrain: Contingencies of responses to active touch during self-initiated arm movements. J. Neurophysiol. 63: 592-606, 1990
- Saddoris MP, Gallagher M, Schoenbaum G: Rapid associative encoding in basolateral amygdala depends on connections with orbitofrontal cortex. Neuron 46: 321-331, 2005
- Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific reward values in the striatum. Science 310, 1337-1340, 2005
- Sato M, Hikosaka O. Role of primate substantia nigra pars reticulata in reward-oriented saccadic eye movement. J. Neurosci. 22: 2363-2373, 2002
- Satoh T, Nakai S, Sato T, Kimura M. Correlated coding of motivation and outcome of decision by dopamine neurons. J. Neurosci. 23: 9913-9923, 2003
- Schoenbaum G, Chiba AA, Gallagher M: Orbitofrontal cortex and basolateral amygdala encode expected outcomes during learning. Nature Neurosci. 1: 155-159, 1998
- Schoenbaum G, Chiba AA, Gallagher M: Changes in functional connectivity in orbitofrontal cortex and basolateral amygdala during learning and reversal training. J Neurosci 20: 5179-5189, 2000
- Schultz W, Apicella P, Scarnati E, Ljungberg T. Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci. 12: 4595-4610, 1992
- Schultz W, Dayan, P, Montague RR. A neural substrate of prediction and reward. Science 275: 1593-1599, 1997
- Shidara M, Richmond BJ. Anterior cingulate: single neuron signals related to degree of reward expectancy. Science 296: 1709-1711, 2002
- Schultz W, Romo R. Responses of nigrostriatal dopamine neurons to high intensity somatosensory stimulation in the anesthetized monkey. J. Neurophysiol. 57: 201-217, 1987
- Shidara M, Aigner TG, Richmond BJ. Neuronal signals in the monkey ventral striatum related to progress through a predictable series of trials. J. Neurosci. 18: 2613-2625, 1998
- Shidara M, Richmond BJ. Anterior cingulate: single neuron signals related to degree of reward expectancy. Science 296: 1709-1711, 2002
- Sugase-Miyamoto Y, Richmond BJ: Neuronal signals in the monkey basolateral amygdala during reward schedules. J Neurosci 25: 11071-11083, 2005
- Suri R, Schultz W. A neural network with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91: 871-890, 1999
- Sutton RS, Barto AG, Reinforcement Learning, MIT Press, Cambridge, MA 1998
- Thorpe SJ, Rolls ET, Maddison S. The orbitofrontal cortex: neuronal activity in the behaving monkey. Exp. Brain Res. 49: 93-115, 1983
- Tobler PN, Dickinson A, Schultz W. Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm. J. Neurosci 23:10402-10410, 2003
- Tobler PN, Fiorillo CD, Schultz W. Adaptive coding of reward value by dopamine neurons. Science 307: 1642-1645, 2005
- Tobler PN, O’Doherty JP, Dolan R, Schultz W. Reward value coding distinct from risk attitude-related uncertainty coding in human reward systems. J. Neurophysiol. 97: 1621-1632, 2007
- Toyomizu Y, Hishijo H, Uwano T, Kuratsu J, Ono, T.: Neuronal responses of the rat amygdala during extinction and reassociation learning in elementary and configural associative tasks. Eur. J. Neurosci. 15: 753-768, 2002
- Tremblay L, Hollerman JR, Schultz W. Modifications of reward expectation-related neuronal activity during learning in primate striatum. J. Neurophysiol. 80: 964-977, 1998
- Tremblay L, Schultz W. Relative reward preference in primate orbitofrontal cortex. Nature 398: 704-708, 1999
- Tremblay L, Schultz W.: Modifications of reward expectation-related neuronal activity during learning in primate orbitofrontal cortex. J. Neurophysiol. 83: 1877-1885, 2000
- Ungless MA, Magill PJ, Bolam JP. Uniform inhibition of dopamine neurons in the ventral tegmental area by aversive stimuli. Science 303: 2040-2042, 2004
- von Neumann J, Morgenstern O. The Theory of Games and Economic Behavior. Princeton University Press, Princeton, 1944
- Waelti P, Dickinson A, Schultz W. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412: 43-48, 2001
- Watanabe M. Reward expectancy in primate prefrontal neurons. Nature 382: 629-632, 1996
- Wilson FAW, Rolls ET. The primate amygdala and reinforcement: a dissociation between rule-based and associatively-mediated memory revealed in neuronal activity. Neuroscience 133: 1061-1072, 2005
- Wise RA. Brain reward circuitry: insights from unsensed incentives. Neuron 36: 229-240, 2002
- Yan J, Scott TR: The effect of satiety on responses of gustatory neurons in the amygdala of alert cynomolgus macaques. Brain Res 740: 193-200, 1996
- Young AMJ. Increased extracellular dopamine in nucleus accumbens in response to unconditioned and conditioned aversive stimuli: studies using 1 min microdialysis in rats. J Neurosci Meth 138: 57–63, 2004
- Valentino Braitenberg (2007) Brain. Scholarpedia, 2(11):2918.
- Tomasz Downarowicz (2007) Entropy. Scholarpedia, 2(11):3901.
- Keith Rayner and Monica Castelhano (2007) Eye movements. Scholarpedia, 2(10):3649.
- Peter Jonas and Gyorgy Buzsaki (2007) Neural inhibition. Scholarpedia, 2(9):3286.
- John Dowling (2007) Retina. Scholarpedia, 2(12):3487.
- Wolfram Schultz (2007) Reward. Scholarpedia, 2(3):1652.
- Philip Holmes and Eric T. Shea-Brown (2006) Stability. Scholarpedia, 1(10):1838.
- Andrew G. Barto (2007) Temporal difference learning. Scholarpedia, 2(11):1604.
Actor-Critic Method, Attention, Basal Ganglia, Conditioning, Neuroeconomics, Q-Learning, Reinforcement Learning, Reward, Temporal Difference Learning