Skip Navigation


Cerebral Cortex Advance Access originally published online on April 27, 2005
Cerebral Cortex 2005 15(12):1964-1981; doi:10.1093/cercor/bhi072
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
15/12/1964    most recent
bhi072v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Koene, R. A.
Right arrow Articles by Hasselmo, M. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Koene, R. A.
Right arrow Articles by Hasselmo, M. E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oupjournals.org

An Integrate-and-fire Model of Prefrontal Cortex Neuronal Activity during Performance of Goal-directed Decision Making

Randal A. Koene and Michael E. Hasselmo

Center for Memory and Brain, Department of Psychology and Program in Neuroscience, Boston University, 64 Cummington Street, Boston, MA 02215, USA

Address correspondence to M.E. Hasselmo, Center for Memory and Brain, Department of Psychology and Program in Neuroscience, Boston University, 64 Cummington Street, Boston, MA 02215, USA. Email: hasselmo{at}bu.edu.


    Abstract
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
The orbital frontal cortex appears to be involved in learning the rules of goal-directed behavior necessary to perform the correct actions based on perception to accomplish different tasks. The activity of orbitofrontal neurons changes dependent upon the specific task or goal involved, but the functional role of this activity in performance of specific tasks has not been fully determined. Here we present a model of prefrontal cortex function using networks of integrate-and-fire neurons arranged in minicolumns. This network model forms associations between representations of sensory input and motor actions, and uses these associations to guide goal-directed behavior. The selection of goal-directed actions involves convergence of the spread of activity from the goal representation with the spread of activity from the current state. This spiking network model provides a biological implementation of the action selection process used in reinforcement learning theory. The spiking activity shows properties similar to recordings of orbitofrontal neurons during task performance.

Key Words: learning • minicolumns • orbitofrontal • reinforcement • selective activity


    Introduction
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
The orbitofrontal cortex plays an important role in goal-directed behavior (Wallis et al., 2001Go). Lesions of the orbitofrontal cortex impair the ability of animals to learn which stimuli are associated with reward (Bechara et al., 1994Go, 1997Go; Frey and Petrides, 1997Go; Miller and Cohen, 2001Go; Pears et al., 2003Go; Izquierdo and Murray, 2004Go). Recordings from orbitofrontal cortex neurons demonstrate that spiking activity in response to sensory stimuli changes dependent upon the association of a stimulus with a reward in humans (Rolls, 1999Go), non-human primates (Thorpe et al., 1983Go; Schultz et al., 2000Go; Wallis and Miller, 2003Go) and rats (Mulder et al., 2003Go; Schoenbaum and Eichenbaum, 1995aGo,bGo; Schoenbaum et al., 2003Go). The orbitofrontal cortex appears to be particularly important when the generation of specific actions depends upon the context of particular sensory stimuli (Miller and Cohen, 2001Go). Here we focus on behavior directed toward a specific goal; we do not yet deal with decisions about the relative value of different goals (Balleine and Dickinson, 1998Go; Tremblay and Schultz, 1999Go).

Here we present a computational model that is applicable to multiple regions of the prefrontal cortex (PFC), demonstrating how populations of spiking neurons could mediate goal-directed behavior. In particular, we demonstrate how representations of specific motor actions can be used for goal-directed behavior in multiple different circumstances, dependent upon the context of specific sensory stimuli. This modeling effectively simulates the behavior and pattern of activity of orbitofrontal cortex neurons described in an experiment by Schultz et al. (2000)Go — neurons that show response to sensory stimuli, to reward and to expectation of reward. This task involves the differential generation of Go versus NoGo responses to randomly presented visual cues. Recordings demonstrated that some neurons in the orbitofrontal cortex do indeed fire selectively for the transition from one specific state to another. Schultz et al. (2000)Go identified these neurons, labeling them as selective for the instruction that initiates a specific trial, as well as predictive for a specific action.

Previous models of frontal cortex function have used neurons with sigmoid input–output functions which represent firing of populations of neurons (Cohen and Servan-Schreiber, 1992Go; O'Reilly and Munakata, 2000Go). In order to model the patterns of spiking activity more directly during behavioral tasks, we use integrate-and-fire neurons (Stein, 1967Go; Gerstner, 2002Go; Gerstner and Kistler, 2002Go) with Hebbian spike-timing-dependent synaptic plasticity (STDP) (Levy and Steward, 1983Go). Integrate-and-fire neurons simulate the membrane potential response to the build-up of synaptic input over time and emit a spike when the potential crosses threshold. The model shows how integrate-and-fire neurons can perform the functions described in equations for a circuit model of the PFC (Hasselmo, 2005Go). The structure of the model was motivated by anatomical evidence suggesting the organization of neural circuits into minicolumns (Lund et al., 1993Go), cell assemblies of highly interconnected neurons found in the PFC. In our model, different minicolumns responded to both sensory input and motor actions, consistent with evidence (Fuster, 1973Go, 2000Go; Fuster et al., 1982Go; Funahashi et al., 1989Go; Quintana and Fuster, 1992Go) that activity in the PFC represents two types of perception: (i) the perception of past sensory stimuli available due to short-term buffers and current sensory stimuli; and (ii) the proprioceptive sensation and prediction of motor actions. The organization into minicolumns was motivated by evidence for strong excitatory and inhibitory connectivity within local circuits of cortical neurons (Mountcastle, 1997Go; Lübke and von der Malsburg, 2004Go). The rapid strengthening of associations between sensory states, motor actions and reward is motivated by studies showing rapid changes in functional interactions between populations of prefrontal neurons during learning (Thorpe et al., 1983Go; Schoenbaum et al., 2000Go; Mulder et al., 2003Go).

The structure of this model closely resembles features of reinforcement learning (Sutton and Barto, 1981Go; Schultz et al., 1997Go; Sutton and Barto, 1998Go), so we will commonly refer to sensory information from the environment as ‘state’. We will refer to motor output as ‘actions’ and to the desired goal as ‘reward’. However, this model does not focus on the temporal difference learning rule (Sutton, 1988Go), a rule that uses the difference between successive outputs as error measure. Instead it focuses on mechanisms of action selection associated with specific sensory states and reward. This demonstrates how integrate-and-fire neurons can perform the circuit mechanism of action selection proposed in a more abstract model of the PFC (Hasselmo, 2005Go).

In the following sections we simulate the proposed mechanism of the prefrontal minicolumn circuitry and apply that to the delayed Go/NoGo task with its reward protocol for different stimuli. We focus on explaining selective neuronal activity, as recorded by Schultz et al., with our model.


    Materials and Methods
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
This model focused on replicating neuronal activity and behavior in the experiments by Schultz et al. (2000)Go. In these experiments, an initial visual stimulus indicates one of three possible trials (Fig. 1A): (i) rewarded movement stimulus (Srm), whereby reward is given if the monkey presses a key; (ii) rewarded non-movement stimulus (Srnm), whereby reward is given if the monkey chooses not to press the key; (ii) unrewarded movement stimulus (Surm), whereby the reward is not given but the key press is still required. Unless the movement is performed in the Surm trial, another unrewarded Surm trial follows. The decision to move or not to move followed a delay of 2 s, when a trigger signal was given, which was identical in each trial. Schultz et al. found that orbitofrontal neurons that showed task related activity fired selectively. Some responded with increased firing rates to a specific instruction cue, some responded with increased firing rates predictive of Go/NoGo choice according to the expectation of reward, and some responded with increased firing rates to reward received.



View larger version (22K):
[in this window]
[in a new window]
 
Figure 1. (A) Summary of the Schultz et al. task. Three visual stimuli indicated above (fractal images) and different types of behavioral trials as in the simulation. (B) Design of the simulation. The simulation includes the experimental environment of the operant task in terms of the task protocol, visual stimuli and motor actions. (a) The input from the environment goes into the perceptual segment of the simulation. Perceptual stimuli are represented by spike trains, which are processed to produce spike pairs that are used as an internal representation. (b) The resulting neuronal spikes cause activity in a simulation of minicolumns in the PFC that includes specifics of relevant neurophysiology and neuroanatomy. (c) The output of the simulated PFC directs motor action in the operant task. The functions of integrate-and-fire neurons and other essential components were implemented in Catacomb2.

 
We propose that goal directed behavior is learned by associating states and actions that are separately represented by the population of neurons of individual minicolumns. A state is indicated by the perception of specific sensory stimuli or the perception of reward received, while an action is indicated by proprioceptive input about motor activity. According to our hypothesis, the initial states Srm, Srnm and Surm, as well as the Reward state, are represented by activity in individual minicolumns in the PFC, while activity in a further two minicolumns represents action selections Go (move to press a key) or NoGo. During learning of goal-directed behavior, STDP strengthens connections within and between minicolumns so that state and action representations are associated. Because activity that corresponds to consecutive states and actions may appear at arbitrary time intervals, a short-term buffer based on persistent spiking due to after-depolarization (ADP) of membrane potential (Andrade, 1991Go; Klink and Alonso, 1997bGo) is used to enable encoding with STDP (Lisman and Idiart, 1995Go; Jensen et al., 1996Go; Koene et al., 2003Go).

We propose that the retrieval of goal-directed behavior depends on the spread of activity through strengthened connections from a minicolumn that represents the reward state and from the specific state minicolumn activated by current input. Consistent with this hypothesis, experimental evidence indicates that retrieval in the PFC produces goal-directed activity that is initiated by the desire for a goal (Schultz, 1998Go; Schultz and Dickinson, 2000Go; Miller and Cohen, 2001Go). In our model, the spread of activity from the representation of current state is gated by the spread from a desired goal. When the gated spread produces output from the minicolumn that represents the current state, the correct next action is selected. Hence, the convergence of activity from a current state representation and from a goal representation governs goal-directed behavioral responses.

Given the representation of states and actions, the transition from one state to another state via a specific action can be encoded uniquely if there is specific neural activity that occurs only for that action and only when the action is initiated in a particular state. This requirement leads to the presupposition that a functional minicolumn contains populations of input neurons and populations of output neurons that form connections with other minicolumns, and that the neurons in those populations are connected in a structured manner to other minicolumns (in this simulation to exactly one). Since the combination of activity at a specific input neuron and a specific output neuron of an action minicolumn represents the transition from a preceding state to a following state, that information gives the model the Markov property (Sutton and Barto, 1998Go). With this property, one-step dynamics enable us to predict the next state and expected reward for a specific action.

We developed simulations of the Schultz et al. task with Catacomb2 (Cannon et al., 2003Go) that replicated the actions of an agent (monkey) within an environment, as well as integrate-and-fire neuron dynamics in PFC. With our approach (which we call ‘design-based’ modeling), data from a simulated operant task protocol was linked with simulated neuronal circuitry for sensory processing and functions of the PFC (see Fig. 1B). Further details of the neurophysiology were modeled explicitly where needed for specific functional requirements, such as the after-depolarization experienced by specific neuron populations that may enable persistent firing.

The integrate-and-fire neurons in our model of PFC minicolumns have a resting and reset potential of –60 mV and an exponential decay time constant of 10 ms. The firing threshold is –50 mV and action potentials have a duration of 1 ms, followed by a 2 ms refractory period and subsequent strong after-hyperpolarization with reversal potential –90 mV and exponential decay time constant 30 ms. We used dual-exponential functions for the responses of synaptic conductances. Unless the description of a specific synaptic connection indicates otherwise, the time constant for the rise of the dual-exponential response function was 2 ms and the time constant for the fall was 4 ms. Excitatory synaptic connections had a reversal potential of 0 mV and inhibitory synaptic connections had a reversal potential of –70 mV.

In the simulation of the operant task environment, stimuli produced by visual cues and reward, as well as proprioceptive sensation of motor activity are conveyed as spike trains (top of Fig. 2) that are produced by specific neurons [signal pathway (a) in Fig. 1B]. The simulation of perceptual processing circuitry receives those spike trains and transforms them into reliable sequences of state–action spike pairs (bottom of Fig. 2). Every time that a spike train corresponding to a new state or a new motor action is detected, a pair of spikes is generated that represents the most recent state and the most recent action. The individual spike times of a state–action spike pair are separated by several cycles of theta rhythm to insure that persistent spiking of the most recent two spike inputs to the short-term buffer occurs over a suffcient duration to achieved strong associative connections through STDP. To simplify the readability of the graphs, an identity matrix is used for input connections to the set of PFC minicolumns instead of a learned mapping [signal pathway (b) in Fig. 1B]. Motor action in the operant task is driven by the output of prefrontal minicolumns [signal pathway (c) in Fig. 1B]. In this manner, the seven trials shown in Figure 2 are simulated during encoding so that all relevant rules are learned in the network of prefrontal minicolumns.



View larger version (28K):
[in this window]
[in a new window]
 
Figure 2. Input spike trains of sensory input (top) and membrane potential showing spike pairs that are the internal representation of changes of state or action (bottom). Vertical lines separate trials (after which buffers are cleared). Rules are learned by exposure to both rewarded and non-rewarded conditions in seven different trials: (1) NoGo following Surm does not lead to reward; (2 & 5) Go following Surm leads to rewarded trial; (3) Srm and Go leads to reward; (4) Srm and NoGo does not lead to reward; (6) Srnm and NoGo leads to reward; (7) Srnm and Go does not lead to reward.

 
Specific Neuron Populations within Prefrontal Minicolumns Achieve the Gating of the Forward Spread of Activity by Spread from the Goal

Retrieval and encoding of associations between prefrontal minicolumns that represent states and actions are assumed to take place in opposite phase intervals of rhythmic modulation at 8 Hz (Hasselmo et al., 2002Go) that represents theta rhythm found in the PFC and hippocampus (Manns et al., 2000Go). This enables both to occur at any time during a task. The modulation supports different dynamics in the two modes. We will therefore discuss the distinct functions of encoding and retrieval separately, even though they alternate continuously during a simulated task. The modulating rhythm also serves to insure that activity in different simulated brain regions is properly synchronized, as described in our previous work (Koene et al., 2003Go). The plot of membrane potential for the buffer neuron abuf (Rew) in Figure 6B provides an example of the modulation by theta rhythm and clearly demonstrates rhythmic changes at 125 ms intervals.



View larger version (29K):
[in this window]
[in a new window]
 
Figure 6. (A) ADP supports persistent firing. Each spike causes initial AHP of the membrane, which is followed by a slow ADP. That depolarization can ultimately lead to another spike. (B) A buffer based on persistent firing receives afferent input during one phase of its rhythmic cycle and reactivates items (separated by competitive inhibition) in order in each cycle. (C) First-in-first-out (FIFO) item replacement. In a full buffer, afferent input plus retrieval activity elicit inhibition synchronized to suppress reactivation of the first item. The input is added at the end of the sequence.

 
As shown in Figure 3, we distinguish five populations of pyramidal neurons in each presupposed functional minicolumn of PFC: a, gi, go, ci and co. Of these, each a neuron connects exclusively to other neurons within the same minicolumn and plays an important role during encoding of associations between minicolumns. These a neurons represent neurons that receive thalamic input in layer IV of PFC. The neurons of a population labeled go experience suprathreshold depolarization during encoding in response to input from a (with a fixed conductance of 5.2 nS and time constants 1 ms for the rise and 2 ms for the fall of the synaptic response), but during retrieval go is inhibited by an interneuron network that is driven by a. A spike in a during encoding also provides subthreshold depolarization to all neurons of a population labeled gi (with a fixed conductance of 1.0 nS and time constants 12 ms for the rise and 20 ms for the fall of the synaptic response).



View larger version (33K):
[in this window]
[in a new window]
 
Figure 3. During training, associations are learned between state and action minicolumns. The network of minicolumns (A) is shown with the connections between them. Activity spreads along associations directed both from the minicolumn representing the goal (dashed arrows) and forward from the minicolumn representing the current state (dotted arrows). To simplify the schematic, populations of neurons, gi, go, ci and co as shown in the Surm minicolumn were reduced in the other minicolumns to display only those neurons that are involved in encoded associations. The numbers in brackets correspond to the marked training trials in Figure 2, in which an associative connection is established by STDP. Here, activity in the neuronal populations of the minicolumns is indicated by shaded neurons. This is shown for retrieval of the correct action that leads to reward from a current state, Srm, in which the rewarded move stimulus was perceived. Neurons that spike are circles shaded gray. A separate diagram (B) shows a linear representation of the associative connections that are strengthened during rule learning (numbers in brackets again correspond to training trials in Fig. 2). The Go and Reward minicolumns each fulfill two roles in the encoded rules.

 
The output of each neuron in the go population projects to one of the other minicolumns in the PFC network. In the gi population, each neuron receives one connection from a go neuron located in another minicolumn. Synaptic weights are modifiable on these connections between different minicolumns and are the elements of a matrix Wg. When strengthened, the Wg connection can fire a unit gi if the presynaptic unit go is active. Such a connection indicates that a rule was learned that expresses the knowledge that activity in the minicolumn containing the postsynaptic neuron gi preceded activity in the minicolumn of the connected go neuron.

Similarly, each neuron of a population co makes one connection to a neuron in a ci population of another minicolumn, so that activity in the co population can target any one of the other minicolumns specifically. Again, the synaptic strengths of such connections are modifiable and make up elements in a matrix Wc. Unlike the effect of synaptic weights in Wg, postsynaptic depolarization due to input through a connection with the maximum strength in Wc is subthreshold, so that spiking in ci remains dependent on additional input. The additional input to neurons in ci, which can elevate their membrane potential over threshold, is supplied by one-to-one connections (an identity matrix) from neurons in go (with a conductance of 2.5 nS and time constants 1 ms for the rise and 2 ms for the fall of the synaptic response). The activity of go therefore fulfills a gating role with regard to spike propagation to ci.

Within a minicolumn, every neuron in gi connects to every neuron in go through modifiable synapses with weights in Wig, while every neuron in ci connects to every neuron in co through modifiable synapses with weights in Wic. The maximum depolarization caused by a connection encoded in Wig is suprathreshold, while depolarization caused by strengthened connections in Wic is limited to subthreshold values. Additional depolarization is provided to co by one-to-one connections from neurons in gi (with a conductance of 2.5 nS and time constants 1 ms for the rise and 2 ms for the fall of the synaptic response). This provides a gating function for decisions about which action is selected based on convergence. The fan-out of connections within a minicolumn between gi and go and between ci and co enables the encoding of multiple routes between minicolumns. The following sections will first describe the retrieval process and then describe encoding.

Retrieving Behavioral Rules in The PFC

Miller and Cohen propose that the top-down processing in which behavior is guided by internal states or intentions (cognitive control) stems from the active maintenance of patterns of activity in PFC that represent goals and the means to achieve them. They suggest that these patterns provide a bias that guides activity affecting behavior, a gating function and support their theory with a review of neurobiological, neuroimaging and computational studies (Miller and Cohen, 2001Go).

In our simulation, associations that form known rules are encoded in PFC. A desire for reward then elicits a spread of activity from the minicolumn representing that reward state (see dashed lines in Fig. 3a and left arrows in Fig. 3b). The neurons of the go population within that Reward minicolumn spike simultaneously in response to rhythmic input at an 8 Hz theta frequency. Those spikes propagate along connections with strengthened synaptic weights in Wg and produce a spike in the targeted gi neurons of minicolumns that immediately preceded the Reward minicolumn in a known rule. Within such a preceding minicolumn (a minicolumn that represents an action) a spike elicited at a neuron in the gi population fans out across strengthened connections to neurons in the go population of that minicolumn. Through those connections with strengthened synaptic weights in Wig, suprathreshold depolarization is elicited at the target go neuron. This same process is repeated in other consecutive minicolumns to spread activity through the gi and go populations of consecutive action and state minicolumns. As the spread branches out, it follows multiple reverse paths through connections that associate states and actions. Once the spread of activity reaches the minicolumn that represents the current state, the convergence of current state and goal spread allows selection of action. In addition, spikes in go neurons are inhibited (‘end-stopping’) by the synchronous activity of interneurons (with time constants 1 ms for the rise and 10 ms for the fall of the synaptic response of the input) elicited by input that identifies the current state.

The selection of action is indicated by an interaction of the goal spread with current state. The input that identifies the current state also targets the neurons in the co population of the same current state minicolumn. The excitatory input produces a subthreshold depolarization of co neurons. In addition to this input, the spiking of neurons in the co population is gated by population gi activity in the same minicolumn due to the spread of activity from the goal. Those co neurons that receive additional depolarization from spiking neurons in the gi population fire.

The present simulation uses only the first step of the forward spread to determine output that controls goal-directed behavior in the task, so the forward gating only has an effect on the co of the minicolumn representing current state. The output of neurons in the co populations of state minicolumns that target action minicolumns is connected to the motor circuitry of the simulation. A spike in co thereby drives motor output of the corresponding action (thick black arrow in Fig. 3a). A spike in co also causes spiking in interneurons that provide lateral inhibition to the remaining neurons in co, so that a clear winner-takes-all behavioral response is obtained.

For other applications, the minicolumn model also enables a forward spread of activity for known associations encoded in the PFC (see dotted lines in Fig. 3a and right arrows in Fig. 3b). The spikes that propagate through connections with strengthened synaptic weights in Wc cause subthreshold depolarization of a ci neuron in the associated action minicolumns. Again, forward spread of activity is gated by the spread from the goal, since a neuron in the ci population needs additional depolarization from a corresponding neuron in the go population to fire. The spike of a ci neuron fans out through connections with strengthened synaptic weights in Wic to co neurons that are gated by the dependence on activity in gi neurons in the same minicolumn.

Figure 3a includes an example of rule retrieval in a rewarded move trial. Neurons that spike as activity spreads are represented by gray circles. The example points out the importance of neuron populations gi, go, ci and co, in which individual neurons make connections with other minicolumns. As shown in Figure 3a, desire for reward causes all neurons in the go population of the Reward minicolumn to fire. The activity then spreads to associated minicolumns, including Go, NoGo and all sensory input minicolumns. In the same trial, when the Srm stimulus is perceived, the co population of the Srm minicolumn is depolarized. In the Srm minicolumn, the specific depolarized co neuron that corresponds with a spiking neuron of the gi population fires, so that activity spreads forward along a route from minicolumn Srm to minicolumn Go. The firing of the co neuron is used to generate the Go response. An analogous approach would be to use the spikes of a ci neuron in the Go minicolumns to generate the Go response. During this process, the go population of the Srm minicolumn is inhibited (end-stopping). Figure 3a shows that the spread of activity from the goal is stopped there.

In the example, spreading activity from the Reward minicolumn involves two different known paths that include the Go minicolumn. One path retrieves the associated items Reward–Go–Srm, the other retrieves the associated items Reward–Go–Surm and a separate path through NoGo retrieves Reward–NoGo–Srnm. [The retrieval of rules resembles the sequence of transitions in a finite state machine (Harel, 1987Go) and the recurrent connections that lead to two visits of the Go minicolumn in trials initiated by the Surm stimulus are reminiscent of connectionist Elman networks (Elman, 1990Go, 1991Go).] Since the spread of activity through different known paths elicits spikes at separate gi neurons, they do not interfere with each other. And since the neurons in ci and co populations also maintain separate connections with other minicolumns, the activity in gi correctly allows the gated forward spread to propagate only on a path from a state receiving current input. Thus, the structure of our model allows mapping through the same action from different states. While retrieval activity spreads forward along known paths to reward, those spikes elicited in the co population of the current state minicolumn that target action minicolumns also trigger the output of PFC. In Figure 3a, the spike propagation through the connection from minicolumn Srm to minicolumn Go is therefore marked as a thick black arrow. This output generates the correct ‘Go’ response, thereby guiding successful goal-directed behavior.

Encoding Behavioral Rules in The PFC

The above section described retrieval. This section describes encoding. During encoding, the neuron labeled a in the model of a minicolumn fires when input that matches the item represented by the minicolumn is received. For example, when an input spike indicates that a rewarded-move stimulus, Srm, is detected, that input causes neuron a(Srm) to spike. Here, it is assumed that stimuli activate minicolumn n after minicolumn n – 1. Encoding is achieved by STDP (Levy and Steward, 1983Go; Markram et al., 1997Go; Bi and Poo, 1998Go) that corresponds to the long-term potentiation (LTP) of synaptic responses (Bliss and Lømo, 1973Go; Bliss and Collingridge, 1993Go). The four steps described below take place sequentially in each encoding cycle.

Reverse Associations between Minicolumns are Encoded in Weight Matrix Wg at synapses from go(n) onto gi(n – 1)

A short-term memory (STM) buffer maintains spiking that corresponds with the two most recent inputs to the network of minicolumns. During this reactivation in encoding phases of PFC minicolumns, a(n) spikes less than 20 ms after a(n – 1). As shown in Figure 4a, the neuron a(n – 1) provides subthreshold depolarization to all the neurons of the gi population in minicolumn n 1. And all neurons in the go population in minicolumn n receive suprathreshold depolarization through synapses from a(n). As the neurons in go(n) spike, that neuron in the gi population of minicolumn n – 1 which is connected to a neuron in go(n) receives subthreshold depolarization, due to the initial value of synaptic strengths in weight matrix Wg. The neuron in gi(n – 1) that receives input from both a(n 1) and go(n) spikes a few milliseconds later than the presynaptic neuron in go(n), so that STDP is elicited. Thus, the amplitude of the corresponding synaptic response is increased in Wg. After several repetitions in the STM buffer, encoding establishes a suprathreshold connection between go(n) and gi(n – 1) (Fig. 4a).



View larger version (10K):
[in this window]
[in a new window]
 
Figure 4. The four steps, (ad), of rule encoding in the PFC. Rectangles indicate the nth minicolumn that activates and the one that precedes it at n – 1. Thin arrows indicate connections between neuron populations (lowercase letters within the rectangles) that may result in subthreshold postsynaptic depolarization (marked sub), while thick arrows indicate connections that may result in suprathreshold depolarization (marked SUP). The matrix of synaptic weights that is updated in an encoding step is indicated by Wg, Wc, Wic and Wig below an arrow that represents connections with synapses that are being modified.

 
Forward Associations between Minicolumns are Encoded in Weight Matrix Wc at Synapses from co(n – 1) onto ci(n)

Rhythmic input modulates the membrane potential of neurons in co. During the encoding phase, the rhythmic depolarization of neurons in co(n – 1) is such that excitatory input through one-to-one connections from gi(n – 1) in the same minicolumn causes postsynaptic spiking. The spiking in gi(n – 1) that is described in the encoding step above therefore drives spiking in co(n – 1), as shown in Figure 4b. The neurons in ci(n) receive subthreshold (gating) depolarization through one-to-one input from neurons in go (n). In the presence of rhythmic depolarization as above and given small initial values in Wc, the neuron in ci(n) that is connected to a neuron in the co population of minicolumn n – 1 spikes due to the combined subthreshold inputs from both go (n) and co(n 1). Again, STDP is elicited, since the postsynaptic neuron in ci(n) spikes a few milliseconds after it receives input from the presynaptic neuron in co(n – 1). After repetition, a subthreshold connection is established between co(n 1) and ci(n), which propagates spikes if input is received from the corresponding neuron in the gating go (n) population, even when rhythmic depolarization is absent in retrieval phases.

Rules that Associate Preceding with Possible Ensuing Activity are Encoded within a Minicolumn by the Weight Matrix Wic at Synapses from ci(n – 1) onto co(n – 1)

During encoding, the activity of the ci population is driven by an STM buffer that maintains the activity of ci populations of the twomost recently active minicolumns. [The buffer holds two items so that the buffered activity ci(n) can replace ci(n – 1) as the memory of preceding activity in ci when the next association with minicolumn n + 1 is encoded.] As Figure 4c shows, neurons in ci(n – 1) spike several milliseconds before spiking of neurons in co(n – 1) is driven by corresponding spikes in population gi(n – 1) (with a synaptic conductance of 6.0 nS), as described above. STDP is elicited and repetition increases synaptic strengths in Wic from initial values near zero to subthreshold amplitudes.

Associations that Enable the Spread of Activity from the Representation of a Goal are Encoded by the Weight Matrix Wig at Synapses from gi(n – 1) onto go (n – 1) within a Minicolumn

During encoding, spiking in a subpopulation of go that is identified as in minicolumn n – 1 is driven by input from ci(n – 1), as shown in Figure 4d. A delay in the synaptic transmission from ci(n – 1) insures that the spikes at occur several milliseconds after spiking in gi(n – 1). At connections that repeatedly experience STDP due to this sequence of spiking, the synaptic strength in Wig is increased from near zero to suprathreshold values.

The population and a population of neurons known as provide separate encoding functions, but as shown in Figure 5, they act together as go during retrieval. In the retrieval mode, transmission from ci(n – 1) to neurons in is suppressed, while input from gi is received through connections with synaptic strengths Wig. The pattern of spikes in gi and suprathreshold synaptic strengths established in Wig therefore determines retrieval spiking in . That spiking is duplicated in during retrieval, since transmission is then enabled through strong one-to-one input connections from . By contrast, all neurons in the population of a minicolumn are driven by a during encoding modes, so that they provide the diffuse output of go (n) that is used to encode Wg and Wc, as described above. In this manner, the two sub-populations of go can spike in separate patterns that satisfy the different needs of encoding protocols for synapses within a minicolumn (Wig) and between minicolumns (Wg and Wc). This function could alternatively be obtained by very tightly regulating the activity of go at different phases.



View larger version (15K):
[in this window]
[in a new window]
 
Figure 5. Subdivision of the go population into functional and neuron populations. Neurons in all spike in response to activity in a, while the spiking of neurons in reflects the specific patterns of spikes received through one-to-one connections from ci(n – 1). Spiking in the filter population relies on rhythmic depolarization, so that only ci(n – 1) activity in the short-term memory buffer of ci drives during encoding. This way, the strength of unique connections in Wg to other minicolumns is encoded separately from the encoding of Wig in accordance with the mapping of a pattern of spikes in gi to a pattern of spikes in . During retrieval, strong one-to-one connections from to drive the entire go population as one.

 
Short-term Memory Based on Persistent Spiking Enabled Spike Timing Dependent Potentiation to Encode Associations

As described, encoding in our model of the PFC depends on STDP in Wg,Wc, Wig and Wic, and on the buffered activity of populations a and ci. A Hebbian model of STDP that is based on the long-term potentiation observed at many synapses requires multiple instances in which presynaptic spiking precedes postsynaptic spiking by <40 ms (Levy and Steward, 1983Go; Markram et al., 1997Go; Bi and Poo, 1998Go), while input to the PFC may arrive with arbitrary large time intervals. As mentioned previously, we therefore presuppose that firing patterns may be reactivated in a persistent manner by intrinsic neuronal mechanisms, such as after-depolarization (ADP) of membrane potential (Fig. 6A), caused by calcium sensitive cation currents that are induced by muscarinic receptor activation (Andrade, 1991Go; Klink and Alonso, 1997aGo). We also presuppose that a common brain rhythm may produce oscillatory modulation in different regions that provides synchronization of activity. The reactivation of firing patterns by ADP in one population of neurons at specific phases of the brain rhythm can thereby reliably provide input to other populations in the PFC where STDP can occur in an encoding mode (Fig. 6B). Using rhythmic modulation and ADP, we provide short-term memory (STM) in a manner similar to the STM model first proposed by Lisman and Idiart (1995)Go and Jensen and Lisman (1996)Go. Recurrent inhibition within such a buffer separates the reactivation of sequential items to maintain their order. The STM may reside in the PFC or may be provided by input from the entorhinal cortex.

The membrane potentials of three neurons of an STM buffer are plotted in Figure 6B. In the hippocampus, regular activity originating in the septum (Brazhnik and Fox, 1999Go) is believed to cause 8 Hz oscillations of the membrane potential by modulating the GABAergic inhibition of pyramidal cells via networks of interneurons (Alonso et al., 1987Go; Stewart and Fox, 1990Go). A similar mechanism appears to cause theta rhythm oscillations in limbic cortices due to rhythmic activity of basal forebrain neurons Manns et al. (2000)Go. Those oscillations define two functional phases of the buffer neurons. We call the phase interval of greatest rhythmic depolarization the reactivation phase of STM and the remaining interval the input phase of STM. The plots show that spiking produced by afferent activity during the input phase of the buffer is reactivated by the ADP during subsequent repetition phases. The duration of the rise of the ADP matches the period of oscillation. This means that the ADP of the earliest neuron to spike in one cycle allows that neuron to reach threshold first in the following cycle. The order of spikes is maintained during reactivation in STM. As spikes caused by the buffer occur in pre- and postsynaptic neurons of modifiable connections in the PFC, an asymmetric function of spike-timing dependent potentiation takes into account the order of spikes. This ensures that STDP is elicited in specific connections so that a direction of causality is inferred during rule learning. Furthermore, the separation of consecutive spikes is maintained in STM by recurrent inhibition that is caused by the activation of an interneuronal network (Bragin et al., 1995Go) each time a buffer neuron spikes.

In the absence of input, the contents of an STM buffer decay gradually, due to noise and a slow-afterhyperpolarization (AHP). But when a full buffer receives new input, such as when rule learning involves a long sequence of states and actions, the earliest item in the buffer needs to retire so that the new item is maintained. The item replacement must also avoid changing the order of items. To achieve this, we propose that the appearance of a new item leads to inhibition at a specific phase of the rhythmic oscillation (see dashed box in Fig. 6C). Inhibition at that specific phase suppresses the reactivation of the first item (Koene et al., 2003Go) until its ADP has subsided, as shown in Figure 6C. The new item, represented by action potentials in the plot of the membrane potential of the third cell, assumes the last position in the sequence of reactivation.

Each neuron in an STM buffer projects output to a corresponding target neuron in a or ci. Current and preceding activity are therefore available for encoding, as shown in Figure 7 for the membrane potential of a neurons throughout the network. The activity in a corresponds to current and preceding input, as pairs of state and action spikes are received in PFC during the seven simulated encoding trials of rule learning (Fig. 2).



View larger version (35K):
[in this window]
[in a new window]
 
Figure 7. The membrane potential of neurons in the a population, responding to input from the short-term memory buffer during the training stage (encoding) of the visual discrimination task in a sequence of six trials.

 

    Results
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
The network described above effectively encoded the different rules of the task and showed effective behavioral performance when tested with different stimuli, generating a Go response to Srm, a NoGo response to Srnm and a Go response to Surm stimuli. This behavior was guided by spiking activity that matches the data obtained by Schultz et al. (2000)Go.

In the seven training trials (Fig. 2), the necessary associations for stimulus gated selection of action were encoded with strengthening of connections using STDP at synapses in Wg, Wc, Wig and Wic. Six trials were used to test performance with all possible initial stimuli. For these trials, the spike trains that represent the sensation of the initial stimulus were provided as input and the model-generated motor commands that lead to behavioral responses and the sensation of reward received were observed. The network showed the correct behavior in the task. The correct action followed each initial state during tests of task performance. Inspection of individual neuronal responses reveals that the three main types of responses observed by Schultz et al. were also found in the present simulations: (i) neurons that respond selectively to a trial-specific initial stimulus; (ii) neurons that respond prior to reward in a specific trial and may indicate a chosen course of action; and (iii) neurons that respond selectively to predicted and obtained reward. In addition to these, several more specialized responses were observed, providing predictions of the model.

During performance of the operant task, a desire for reward begins at the onset of every trial in the form of regular suprathreshold input to all neurons of the go population of the minicolumn that represents the goal. When trial input stimuli appear in different trials they are maintained as persistent spikes of buffer neurons that cause the spiking of a(Srm), a(Srnm) and a(Surm) in Figure 8. These input stimuli also provide subthreshold input to the co population of the minicolumn that represents the current state. Converging with the spread of activity from the goal minicolumn, spiking co neurons drive goal-directed behavior, resulting in the generation of output which in turn causes proprioceptive feedback of the correct action in each sequence in Figure 8, as well as the perception of reward received.



View larger version (30K):
[in this window]
[in a new window]
 
Figure 8. The membrane potential of neurons in the a population, responding to input from the STM buffer during correct behavioral performance (retrieval) of the visual discrimination task in a sequence of six trials. A change of context between rewarded movement (RM), rewarded non-movement (RNM) and unrewarded movement (URM) trials causes the STM buffer to clear during the those intervals.

 
Activity Underlying Selective Responses in the Model

Membrane potentials of those neurons within a minicolumn that are involved in the choice of action demonstrate the decision process that is based on a forward spread of activity that is gated by the spread of activity from the goal. This is shown in Figure 9, in which membrane potentials of relevant a, gi and co neurons in the minicolumn that represent the Surm instruction state are plotted during an interval within an Surm trial (the convergence looks the same for the Srm example in Fig. 3). The plots show that neurons in the co population of that minicolumn experience subthreshold depolarization due to current state input from a. This contribution is joined by converging input from a specific neuron in the gi population that spikes due to the spread of activity from the minicolumn that represents the goal (dashed arrows in Fig. 3). When the inputs converge a neuron of the co population fires (bottom of Fig. 9). Activity in co was gated by activity in gi, and recurrent inhibition assured that only the first spike in co led to a behavioral response. The chosen behavior was determined by the minicolumn that was targeted by that spike, in this example a Go motor command for the simulated task environment.



View larger version (16K):
[in this window]
[in a new window]
 
Figure 9. Selected membrane potentials during converging forward spread and spread from the goal in a unrewarded move (Surm) trial. The forward spread is initiated in state Surm, as represented by the action potential of the a neuron (top). The spread of activity from the goal reaches the Surm minicolumn when an action potential appears at a specific gi neuron (middle). A resulting action potential that directs Go action appears at a specific co neuron of the same minicolumn (bottom).

 
For the six test trials, the spike trains that represent the sensation of the initial stimulus, motor commands that lead to behavioral responses and the sensation of reward received are shown in Figure 10. The spike trains show that Srm stimuli were followed by Go responses and reward, Srnm was followed by NoGo responses and reward, and Go action responses followed Surm stimuli and led to subsequent rewarded trials. The network can perform correctly regardless of the order of presented test stimuli.



View larger version (16K):
[in this window]
[in a new window]
 
Figure 10. The guided output (Go or NoGo) of the model in response to test stimuli (Srm, Srnm and Surm). Spike trains produced in the motor circuitry of the simulated operant task environment during trials that test task performance (testing retrieval). The spike trains represent sensory stimuli received during the trials (separated by vertical lines), as well as behavioral Go and NoGo motor responses generated by the network and the sensation of reward received.

 
Schultz et al. plotted the recorded spikes of three orbitofrontal neurons during many rewarded move (Srm) and unrewarded move (Surm) trials. We compare our simulation results with those of the experiment by Schultz et al. by displaying results for the three main categories of neuronal responses described by Schultz et al. side by side in Figure 11. These plots show spikes in individual trials (short vertical lines) aligned to specific parts of the task.



View larger version (29K):
[in this window]
[in a new window]
 
Figure 11. A side-by-side comparison of neuronal activity recorded by Schultz et al. (A–C; figure reproduced from Schultz et al., 2000Go) and that produced by our simulation of PFC minicolumns (D–F). Figures in (AC) display spikes of three different orbitofrontal neurons. For each, the activity in rewarded and unrewarded movement trials is shown side by side. And every row within the borders of a graph represents the activity of that neuron during a separate trial. The time course of the data and of the model output are aligned to specific task events. Labels below a horizontal time axis indicate stages of the operant task: instruction stimulus, action trigger, reward. Above each graph, a histogram shows the sum of spikes in each bin of time, i.e. in a corresponding column over all trials. (D) The spike responses of an a population neuron in the RM (rewarded move) minicolumn aligned to instruction. (E) A co population neuron in the RM minicolumn with output connections to the GO minicolumn aligned to reward. (F) An a population neuron in the REW (Reward) minicolumn. Again, the spikes (short vertical lines) of each neuron are shown side by side in both rewarded and unrewarded movement trials. Rows within each figure show the results of separate simulation runs, while the cumulative spike rate is plotted above each figure by counting the number of spikes within an interval around t. The three neurons in (D–F) replicate the experimental results by Schultz et al. in the corresponding categories (A–C). Figure reprinted with permission from "Reward processing in primate orbitofrontal cortex and basal ganglia," by Schultz, Tremblay, and Hollerman, Cereb Cortex 10: 272–283.

 
As in the Schultz et al. results, our results showed that individual neurons activate specifically when one of the three cue stimuli is perceived. In our model, this is caused by the current state response of the a population (Fig. 11A,D). We also found individual neurons that activate for a chosen behavioral response. This activity results when neurons of the co population in the current state minicolumn receive gating activity from gi neurons due to the spread of activity from the goal minicolumn (Fig. 11B,E). We also found neurons that activate specifically when reward is received. This activity is caused by the current state activation of the a neuron in the goal minicolumn in our model (Fig. 11C,F).

As in the Schultz et al. data, there is spiking in Figure 11E during Srm and Surm trials, but the spike rate is higher during the Go action in Srm. Both the data and the output of our model show a quantitative difference in the amount of firing between Srm and Surm trials before reward is received. In our model, this is explained because co(Srm->Go) is activated in encoding phases in both trials when a(Go) is maintained by the STM buffer, since strengthened connections from go(Go->Srm) to gi(Srm<-Go) propagate the activity. Additionally, co(Srm->Go) is activated specifically in the Srm trial when the goal spread causes spiking in the gating gi(Srm<-Go) neuron, while current state input depolarizes the co(Srm) population. The appearance of similar activity at the trigger time during URM trials in Figure 11B suggests that the activity is not merely background noise and supports the possible explanation provided by our model.

A smaller temporal overlap of activity similar to that in the Schultz et al. results is achieved if the intervals between instruction stimulus, action trigger and reward delivery are increased in the model to match the data, for a trial length of 6–8 s instead of 1500 ms in the simulation. The shorter intervals in the model significantly reduced the time needed to compute each simulation run without affecting resulting behavior.

Some Neurons in the PFC are Active in Multiple Behaviors

In addition to the results above, we found that some neurons in the simulation activate selectively for a specific phase of two different trials. As shown in Figure 12A, the a(Go) neuron in the minicolumn that represents a movement response spikes in rewarded movement and unrewarded movement trials. Similarly, the a(Rew) neuron in the minicolumn that represents the perception of reward spikes in rewarded movement and rewarded non-movement trials.



View larger version (49K):
[in this window]
[in a new window]
 
Figure 12. (A) Spike activity of a neurons in the Go and Reward minicolumns during performance trials. Neurons a(Go) that predict a movement response are active in rewarded movement and unrewarded movement trials. Neurons a(Rew) that spike when reward is received are active in rewarded movement and rewarded non-movement trials (B). Retrieval activity in the simulation shows that specific gi and go population neurons spike in all trials. Regular spike trains span trials for each of the neurons shown. In our example, 24 neurons in gi and go populations were found to spike regularly during retrieval in all trials of the performance stage of the specific task, ~7% of the total of 328 neurons involved in retrieval functions. (C) The neuron of the Srm minicolumn shows the end-stopping function. The neuron spikes throughout trials due to the spread of activity from the goal minicolumn, but in rewarded movement (RM) trials spiking stops as soon as reward is received (indicated by arrows with the label ‘R’). The overlap between spike trains of the a(Go) neuron and the neuron shows the period during which both Srm and Go minicolumn activity are maintained in an STM buffer for encoding.

 
In Figure 12B, we show that specific neurons in the gi and go populations of minicolumns that are involved in the retrieval of associations with a goal generated a spike in every trial of that specific task. The neurons that activate throughout each trial correspond to those involved in the learned associations for the spread of activity from the goal during retrieval, as shown in Figure 3. Thus, even neurons with very extensive response properties are important for performance of this task. Activity of the a population in the current state produces end-stopping of activity through the go population in the same minicolumn. Therefore, the onset of a rewarded move (RM) trial produces end-stopping at go(Srm) cells, but, due to the associations from Reward to Srnm via NoGo and from Srnm to Surm via Go, a neuron in gi(Surm) also spikes during that trial. Similarly, gi(Surm) spikes during rewarded non-movement trials due to the alternate path for the spread of retrieval activity from the goal via the Srm minicolumn. Thus, we predict a correlation of neuronal firing during Surm and Srm trials (strong Go involvement in both), and a lesser correlation of neuronal firing during Surm and Srnm trials, as shown in rows 1 and 3 of Figure 12B.

Activity in Figure 12C demonstrates the end-stopping function proposed in the minicolumn model. During rewarded movement trials, the neuron is active until reward is received. As soon as the perception of reward becomes the current state of the PFC network, the neuron is no longer active. This is not the case in rewarded non-movement and unrewarded movement trials. In rewarded movement (RM) trials, end-stopping prevents the spread of activity from the goal to the go population of the Srm minicolumn. During these trials, the neuron is active in encoding modes of each rhythmic cycle while maintained in the STM buffer. When reward is perceived, the Go–Reward pair replaces the Srm–Go pair in the buffer, as seen in the bottom two rows of Figure 12B. End-stopping appears in Srm (RM) trials and Srnm (RNM) trials, but not Surm (URM) trials, since two associative paths can be taken from the goal minicolumn to the Surm minicolumn.

Schultz et al. point out that some neurons activated less selectively, namely in a manner that was selective for the instruction cue regardless of trial type and expected reward. Similarly, our simulation shows that a neuron of the ci(Srm->Go) population in the Go minicolumn that receives input from the Srm minicolumn exhibits retrieval spikes in both Srm and Surm trials during instruction activity in the Srm or Surm minicolumns. Those retrieval spikes disappear once the Go minicolumn receives proprioceptive input about a key press movement in the environment and spikes begin to occur in the encoding phase of theta modulated network. This produces a 180° phase shift of firing at the time of the movement generation. The Go minicolumn ci(Surm->Go) neuron that receives input from the Surm minicolumn exhibits the same transition of spiking from the retrieval to the encoding phase, but its retrieval spiking is more selective and appears only during an Surm trial, since no sequence exists that involves the Surm minicolumn in other trials.

Schultz et al. provide a quantitative assessment of the trial and phase selective responses recorded. Of 505 neural responses identified at recording sites, 188 exhibited task related activity: 99 responses showed selective activity at the instruction phase of trials. Of those, 63 reflected the type of reinforcer or trial (38 active during RM, RNM or both trial types, 22 active only during URM trials and three active during RM and URM trials). Fifty-one responses showed selective activity at the trial phase preceding reward (41 during both RM and RNM trials, six during RM or RNM trials and four during URM trials). Sixty-seven responses showed selective activity at the reinforcer delivery phase of trials (62 during both RM and RNM trials, two during only RM trials and three during URM trials).

Before comparison of these numbers with the model, some caveats should be raised. The small sample sizes in terms of the number of sites recorded by Schultz et al. and the number of neurons simulated in the model is too small to allow statistical comparison. Also, the number of selective model responses in a specific category depends on the arbitrary number of neurons chosen as a cell assembly within a population of neurons in each minicolumn. When the model is minimized so that individual functions of the minicolumn are performed by the smallest number of neurons, then the following quantitative assessment of responses was obtained.

In the simulation, the neural circuitry of the model prefrontal minicolumns consisted of 328 neurons (excluding neurons that form short-term buffers and circuitry to process prefrontal input and output). From those neurons, 169 task related responses were recorded: 37 responses showed selective activity at the instruction phase of trials. Of those, 34 reflected the type of reinforcer or trial (21 active during RM or RNM trials, 10 active only during URM trials and three active during RM and URM trials). Seventy-five responses showed selective activity at the trial phase preceding reward (40 during RM or RNM or both trial types, 11 during only URM trials, 17 during RM and URM trials and seven unselective for trial type). Fifty-seven responses showed selective activity at the reinforcer delivery phase of trials (20 during both RM and RNM trials, 14 during only RM trials, 21 during RNM trials and two unselective for trial type).

These results support a correlation during the instruction phase between RM and RNM trials seen in both data and model. The absence of a correlation between URM and RNM during the trial phase preceding reward is also consistent with the data. The number of responses for both RM and URM trials is rather higher than the data, as is the response activity for only RNM trials. Both differences may reflect a difference in the model or merely statistical variability.


    Discussion
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Our model replicates goal-directed behavior in a visual discrimination task based on a hypothesis about the functional connectivity of PFC circuits (Hasselmo, 2005Go). Behavioral responses and reward associations to visual cues are encoded in synaptic strengths between neuronal networks representing cortical minicolumns. The goal-directed behavior is retrieved by means of a converging spread of activity from a representation of desired reward and the spread of activity from the current state. Our results specifically replicate the qualitative findings by Schultz et al. (2000)Go in terms of individual neuronal responses, while suggesting a possible neural mechanism for learning and retrieval. We use the model to propose explanations for the selective responses of individual neurons in orbitofrontal cortex during goal-directed behavior.

The model provides a framework for the context/stimulus dependent change in action selection, as proposed by Miller and Cohen (2001)Go. In particular, it provides a spiking neuron implementation of context effects similar to those of Cohen and Servan-Schreiber (1992)Go. We show how populations of spiking neurons could interact to allow selection of specific actions based on the context of specific sensory input (states) and the desire for reward. Because activity in a specific minicolumn (Fuster, 2000Go) that represents such a state or action may play a role in different contexts that require its association with different state–action-state transitions, we presuppose separate populations of neurons within a minicolumn for input from and output to other minicolumns (Hasselmo, 2005Go). For example, the Go and Reward minicolumns in the experimental task fulfill such multiple roles, as shown in Figures 3 and 12A.

We show what functional role the individual neurons in these populations could play in the performance of the task by replicating essential features of the Schultz et al. experiment. We used similar learning and retrieval protocols and replicated individual neuronal responses that are selective for a specific state in a specific trial (see Fig. 11). These selective responses may be understood in the context of a neuron's function in the minicolumn model.

In addition to these explanations, the model generates predictions for this task about what other types of responses should appear in the PFC, including neuronal responses which would look rather complex and might therefore not normally be classified. One set of complex responses is shown in Figure 12B. The model predicts that some neurons will spike throughout all trials of a goal-directed task, not just for a specific state, due to the spreading activity from a goal representation. And if encoding and retrieval alternate continuously as modeled, then such responses that are indicative of spreading activity should be recorded during stages of novel learning as well as task performance.

Our results also propose that end-stopping implemented in the retrieval function of the model may be detected as shown in Figure 12C. Evidence that supports possible end-stopping of spreading activity is provided by the termination of recorded spikes in Schultz et al. (2000)Go, where neuronal activity that is selective for Srm or Srnm instruction stimuli and for action preceding reward terminates as soon as reward is received.

Predictions of the model suggest experiments that test the validity of two of its central tenets: convergence of activity through representations that may be associated in multiple ways (Sutton and Barto, 1981Go) and the need for a short-term buffer.

The structure of the model uses a progressive backward spread of activity from the goal. This suggests an experiment that could test this feature, in which associations are formed sequentially between states and actions leading to a particular goal. Imagine an operant task, in which