Cerebral Cortex Advance Access originally published online on May 8, 2007
Cerebral Cortex 2008 18(1):67-77; doi:10.1093/cercor/bhm037
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sparseness Constrains the Prolongation of Memory Lifetime via Synaptic Metaplasticity
1 Institute for Theoretical Biology, Humboldt-Universität zu Berlin, Invalidenstraße 43, 10115 Berlin, Germany, 2 Neuroscience Research Center, Charité, Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany, 3 Bernstein Center for Computational Neuroscience Berlin, Philippstraße 13, 10115 Berlin, Germany, 4 Current address: Division of Neurobiology, Ludwig-Maximilians-Universität München, Großhaderner Straße 2, 82152 Planegg-Martinsried, Germany
Address correspondence to email: leibold{at}zi.biologie.uni-muenchen.de.
| Abstract |
|---|
|
|
|---|
Synaptic changes impair previously acquired memory traces. The smaller this impairment the larger is the longevity of memories. Two strategies have been suggested to keep memories from being overwritten too rapidly while preserving receptiveness to new contents: either introducing synaptic meta levels that store the history of synaptic state changes or reducing the number of synchronously active neurons, which decreases interference. We find that synaptic metaplasticity indeed can prolong memory lifetimes but only under the restriction that the neuronal population code is not too sparse. For sparse codes, metaplasticity may actually hinder memory longevity. This is important because in memory-related brain regions as the hippocampus population codes are sparse. Comparing 2 different synaptic cascade models with binary weights, we find that a serial topology of synaptic state transitions gives rise to larger memory capacities than a model with cross transitions. For the serial model, memory capacity is virtually independent of network size and connectivity.
Key Words: associative memory cascade model of synaptic plasticity cell assembly memory capacity population code recurrent network
| Introduction |
|---|
|
|
|---|
Since Hebb (1949)
We are presently only beginning to understand how learning with discrete synapses affects functional and computational properties of neural systems (e.g., Brunel et al. 2004
; Fusi and Abbott 2007
). Obviously, the larger the synaptic state space, the more information can be stored in the synaptic configuration. But how can this information be retrieved in a dynamical neural network? This is a nontrivial problem, in particular because metaplastic changes do not affect postsynaptic currents and, hence, information about synaptic meta states cannot be derived from the activity of neurons.
Metaplasticity is commonly thought to prolong the lifetime of memories while keeping the network receptive to the storage of new memories (Abraham and Bear 1996
; Montgomery and Madison 2004
). This idea has recently been corroborated by Fusi et al. (2005)
. Assuming a homogeneous population of synapses, they have derived that a certain number of synaptic meta states is optimal with respect to maximizing memory lifetime. Their model, however, does not consider the dynamics of a neural network, that is, how the evoked postsynaptic potentials sum up and give rise to suprathreshold network activity reflecting the stored memories. In this paper, we show that the necessity to retrieve synaptically stored information from the activity of neurons poses a crucial constraint. In particular, the advantage of metaplasticity may be outweighed by sparse representations of memories (Tsodyks and Feigel'man 1988
; Amit and Fusi 1994
). For highly distributed codes, on the other hand, we propose a new metaplasticity model with an optimal memory capacity that is independent of network size and connectivity.
| Results |
|---|
|
|
|---|
Memories are considered to be successively imprinted into the synapses of a recurrently connected network of N neurons through discrete changes of synaptic states. These memories degrade over time because of ongoing storage of new memories. Here, a memory trace is regarded as the association between 2 random assemblies of M neurons, a cue and a target assembly (e.g., Willshaw et al. 1969
Learning is thought to establish such an association between 2 formerly unrelated assemblies via synaptic alterations in a supervised fashion. Synapses from cue to target neurons receive a signal that is proposed to initiate long-term potentiation (LTP) with some probability. Synapses in the reverse direction (target to cue) may undergo long-term depression (LTD). The learning rule is thus assumed to be local: it triggers plasticity signals only at synapses that connect neurons of the cue and target assembly, whereas all other synapses remain unaffected. In addition, we assume that synapses related to neurons that are members of both the cue and the target assembly receive an indecisive signal and do not change states (Discussion). If f = M/N denotes the probability that a randomly chosen neuron belongs to a specific assembly, the indecisive signal is conveyed to the synapses of the Mf neurons in the overlap between both assemblies. The abbreviation f = M/N is also called the coding ratio.
Synaptic alterations are considered to occur via switch-like transitions between the discrete synaptic states that are induced by the plasticity events associated with the storage of new memories. Three models of synapses with discrete states are considered: 1) two-state synapses, which are either silent or activated; 2) binary synapses with multiple states and complex metaplasticity, which resemble the cascade model of Fusi et al. (2005)
; and 3) binary synapses with multiple serial state transitions, in which the probabilities of all transitions equals one.
In the present framework, the overall distribution of synaptic states in the whole network resides in equilibrium at any time. The specific shape of the equilibrium distribution is determined by the specific choices of synapse model and learning rules (see below and Appendix A.5.1, A.5.2, and A.5.3). Intuitively, if the probability for LTD exceeds the probability for LTP, more synapses are in a depotentiated state and vice versa. In contrast to this overall distribution, the state distribution attached to one specific memory (association) may be far from equilibrium: Right after learning a new association, a more than average fraction of synapses connecting cue to target neurons is in the potentiated state. This association-specific distribution then asymptotically converges to the equilibrium distribution while successively storing other associations. The convergence to equilibrium thus corresponds to the overwriting of this particular memory.
As a measure of memory lifetime, we define the number P of successive associations that are necessary to overwrite a specific initial association such that the latter no longer can be retrieved. Given a constant rate
of new associations per time, the time P/
can be reinterpreted as the duration a memory remains stored in the synaptic configuration of the network (Amit and Fusi 1994
). The learning rate
, however, is difficult to estimate and will be substantially influenced by a variety of factors like species, brain region, environment, behavioral state, attention, etc. In what follows, we thus simply neglect
and, instead, use P directly as a measure of memory lifetime. In other words, we measure time in units of newly acquired memories. Technically, this is equivalent to setting
= 1.
Two-State Synapses
To gain understanding of how memory lifetime (longevity) arises in a dynamical recurrent network, we start out by considering binary synapses without meta states, which means that each single synapse can only exist in one of 2 states, silent or activated. Immediately after having stored an association, all synapses that connect the cue neurons to the target neurons shall be activated, that is, the learning rule is such that these synapses switch from the silent to the activated state with probability one. Lower transition probabilities not only enhance memory lifetime but also reduce the network's receptiveness to new memories (Amit and Fusi 1994
; Fusi et al. 2005
) and, thus, are not considered here. Furthermore, let us assume that synapses from the target neurons back to the cue neurons are silenced with probability one. This leads to an equilibrium distribution of synaptic states, in which half of the synapses are activated and the other half is silent. For a given probability cm that a neuron is synaptically connected to another one, the number of activated synapses in a network of N neurons therefore fluctuates around a mean value of cmN2/2. The probability cm of a morphological connection is also called the morphological connectivity.
As a result of the equilibrium distribution of synaptic states, an arbitrary pair of pre- and postsynaptic neuron is connected via an activated synapse with probability cm/2. Thus, at neurons that do not belong to a particular target assembly, M simultaneously spiking cue cells give rise to cmM/2 inputs from nonsilent synapses, which is thus considered as a baseline or background depolarization in what follows. We therefore define the "signal" of a memory trace to be the excess depolarization with respect to this baseline. Because, for the 2-state model, we have modeled the storage of a memory through the activation of all synapses connecting the cue to the target assembly, the initial memory signal equals the number of synapses at a target cell that were switched on. Because we have assumed that the synapses at the Mf neurons in the overlap of cue and target assembly do not change state, the initial memory signal at one target neuron amounts to M(1 – f)cm/2 synapses on average.
Sparse Representations
To understand the temporal evolution of the memory signal, let us first focus on the case of sparse coding, f=M/N<<1. This scenario allows for an easy analytic treatment and interpretation. A more general scenario is outlined in the Appendix.
The initial signal cmM/2 of a memory trace at t = 0 decays because of the storage of further memories. The storage of a single new memory in a subsequent time step switches off synapses in the initial memory trace with probability f2. The number of activated synapses in a specific association therefore decays to a resting level of cmM/2 with time constant
= 1/(2f2), and thus the signal after t further associations amounts to cmM/2 exp[–2f2t]. This decay of the number of activated synapses is illustrated in Figure 1A.
|
The ongoing storage of associations between pairs of random cell assemblies also induces fluctuations of the memory signal. These fluctuations define a noise level above which the signal must be detectable. The standard deviation of these fluctuations equals the square root of the average number of activated synapses for random selections of M presynaptic neurons, that is,
The lifetime P of a memory is the time t at which, on average, the signal arrives at the noise level. We again note that time is measured in units of stored associations, so that P also equals the total number of associations that can be stored in the network. More specifically, we define P as the time at which the signal-to-noise ratio of a memory imprinted at time t = 0 equals a threshold K > 0, which is determined by the desired amount of activation of the target assembly. It will turn out that the specific choice of the threshold K does not change the general dependence of P on system parameters such as the network size N, assembly size M, and connectivity cm. We emphasize that here the detectability of the memory signal is defined via an average over the ensemble of target neurons, and thus the memory signal is no single-cell quantity, though it may seem so (Appendix A.4).
Given a memory signal cmM/2 exp[–2f2t] and a noise level
, the signal-to-noise ratio attains a value of K at time
. This is an encouraging result because the lifetime P of a memory increases quadratically with network size N. Moreover, it is proportional to the total number cmN2 of synapses in the network, leaving cm and M fixed. In contrast, for a fixed coding ratio f = M/N, P scales logarithmically with the average number cmM of synapses per target neuron involved in a particular association. Maximum longevity Pmax is obtained from dP/dM = 0, which yields an optimal assembly size
|
| (1) |
At the optimal assembly size, the memory lifetime reaches its maximum
|
| (2) |
1, we find the optimal coding ratio fopt = Mopt/N to be small, which is consistent with the initial assumption M<<N. We note that throughout the paper (cf., Appendix A.5), analytical results concerning Mopt and Pmax are derived in the limit of sparse coding M<<N and checked for self-consistency. General Solutions
Analytical results obtained in a regime of sparse representations are corroborated by numerical evaluations of the readout criterion (Appendix A.4), which allowed us to also consider nonsparse coding ratios.
To compare memory performance of networks with different sizes N and connectivities cm, one defines the memory capacity
|
|
on the coding ratio f is stereotypic (Fig. 1B). For infinitely low f, the assembly size is too small to make the signal (
M) overcome the noise
The numerical results shown in Figure 1C confirm our analytical results that the storage capacity
at the optimal coding ratio fopt scales linearly with N. Classical approaches to memory capacity (e.g., Golomb et al. 1990
; Nadal 1991
), however, do not optimize with respect to f but treat it as a free parameter. Yet, for fixed f, the capacity exhibits inferior scaling behavior and may even decrease with N (e.g., f = 0.3 in Fig. 1C).
Binary Synapses with Complex Metaplasticity
The lifetime of a memory can be prolonged by reducing the probability of synaptic changes (Amit and Fusi 1994
). An obvious drawback of such a strategy is that memories become more rigid and, hence, it is more difficult to store new associations. This problem is the well-known dilemma of finding a compromise between providing a high amount of plasticity and, at the same time, ensuring a high longevity of memories.
Fusi et al. (2005)
have recently shown for M/N
1 that synaptic metaplasticity might offer an elegant solution to this dilemma. Metaplasticity means that synaptic changes do not necessarily alter a synapse's influence on the postsynaptic membrane potential but rather modulate the probability that the synapse undergoes a weight change the next time it is exposed to an LTP or LTD stimulus. Figure 2A illustrates the transition probabilities for a slightly modified (see Appendix A.5.2) version of the cascade model defined by Fusi et al. (2005)
for n = 4 meta levels. The synaptic weight either attains the value w = 0 (silent) or w = 1 (activated), as in the case of 2-state synapses. The probabilities of state transitions for n > 1 are constructed as follows: if the synapse is silent (activated) and is exposed to an LTP (LTD) stimulus, it becomes potentiated (depotentiated) with probability (1/2)µ, where µ = 0, 1, ..., n – 1 µ = 0, 1, ..., n – 1 counts the meta levels. Thus, the smallest transition probability amounts to
If a synapse is silent (activated) and receives an LTD (LTP) stimulus, it switches to level µ + 1 with probability (1/2)µ. In that case, the synaptic weight w remains unchanged. However, if a silent (activated) synapse is in the "lowest" meta level µ = n – 1 and receives an LTD (LTP) stimulus, no state change occurs.
|
Metaplasticity is thought to increase memory lifetime because the synaptic meta level reflects the "history of the synapse," and future plasticity is "dictated by previous plastic changes" (Montgomery and Madison 2004
Memory longevity, however, depends not only on the number n of meta levels but also on the sparseness of the code, that is, the assembly size M. Both quantities cannot be optimized independently of each other to maximize memory longevity: larger n, fewer synapses are driven into a potentiated state by an LTP signal because more synapses will reside at meta levels with a decreased probability of potentiation. As a result, the mean memory signal is smaller, the more meta levels the synapse provides. To nevertheless produce a sufficiently high memory signal, the number M of cells in a synchronously active assembly must increase with the number n of meta levels. Larger assemblies, however, are detrimental for memory lifetime because of the increased amount of interference between the stored associations. It is a nontrivial problem to understand whether the advantages of sparsification outweigh those of an increased synaptic state space or vice versa.
Here we discuss the trade-off between sparse coding and the number of synaptic meta levels in the light of our network model. The longevity P of memories as a function of the assembly size M is derived numerically (Appendix) and is illustrated in Figure 2B for several different numbers n of meta levels. We observe that increasing n reduces the maximal memory lifetime Pmax while the optimal assembly size Mopt at the maximum becomes larger. Analytical results derived for f = M/N
0 show the optimal assembly size to scale like (n + 1)2 and the maximal lifetime to decrease like (Appendix, eq. [30])
|
| (3) |
0.5. Binary Synapses with Serial State Transitions
The above findings about the longevity of memories may either reflect a specific feature of the transition topology of the model of Fusi et al. (2005)
or they may be a general property of metaplasticity rules. To further investigate this question, we propose a simpler topology of transitions between synaptic states, in which synaptic states are connected one after the other. This model is illustrated in Figure 3A for n = 3 meta levels. The transition probabilities between all states equal one, that is, after a plasticity signal every possible synaptic state change occurs. However, only state changes within meta level µ = 0 are also associated with a weight change.
|
Numerical evaluation of memory lifetime for the serial model (Fig. 3B) reveals a similar behavior as for the complex metaplasticity model. The smaller the number n of synaptic meta levels, the higher we find the maximal lifetime Pmax. For large assembly sizes M, meta levels (n > 1) can again be advantageous. An estimate of P in the case f = M/N
0 reveals that the optimal assembly size Mopt scales with n2 and that the maximal lifetime decreases like (Appendix, eq. [34])
|
| (4) |
Unbalanced Plasticity
In all models presented so far, LTP and LTD are precisely balanced, which is defined via the depolarization of nontarget cells to be half the maximum depolarization, that is,
(Appendix A.1). To rule out that the assumption of balanced LTP and LTD accounts for the longest memory lifetimes of 2-state synapses, we also investigated variants of the serial model in which we reduced the transition probabilities for either LTP or LTD stimuli, which serves the purpose of unbalancing the learning rule.
The numerical results of Figure 4A reveal that an LTD-prone regime results in a dramatically altered equilibrium state distribution: depressed states (
n) are more strongly occupied than potentiated states (>n). Interestingly, in such an LTD-prone regime, with more silent (w = 0) synapses than activated (w = 1) ones, the maximum memory lifetime Pmax can even be slightly enhanced compared with a balanced plasticity rule (Fig. 4B; e.g., Brunel et al. 2004
; Leibold and Kempter 2006
). This enhancement is essentially due to the increase of the initial memory signal, which is roughly the difference between the maximal depolarization cmM and the equilibrium depolarization. The latter is smaller in an LTD-prone regime giving rise to a larger initial memory signal as compared with a balanced or LTP-prone regime. The equilibrium depolarization, however, must not be too small, because noise becomes more influential, the fewer synapses are involved in an association.
|
The maximal memory lifetime Pmax is quite robust against unbalancing LTP and LTD for a small number n of synaptic meta levels (Fig. 4B). With increasing n, the memory performance of the network becomes more sensitive to unbalancing. However, even for n = 15, where the maximum memory lifetime is obtained at an LTP/LTD ratio of about 90%, a deviation of 10% from this optimal LTP/LTD ratio still accounts for about 90% of the largest possible longevity Pmax.
Highly Distributed Representations
We have shown that 2-state synapses are better suited for maximizing the lifetime of associations between random assemblies if a neuronal system is capable of optimizing the number M of neurons firing synchronously in an assembly. This optimization of M may, however, not always be feasible. For large assembly sizes, we suspect that more complex synapses may become superior in general (see insets in Figs 2B and 3B).
To compare the performances of both synapse models at high coding ratios, we fixed f at a value of 0.3 and numerically calculated the optimal number nopt of meta levels and the respective memory capacity
= P/(cmN); see Figure 5. We find that for both models, the optimal number nopt of synaptic meta levels is an increasing function of network size and morphological connectivity. For the model with complex transitions, Fusi et al. (2005)
have shown nopt to scale logarithmically with the total number cmN2 of synapses in the network. In the model with serial state transitions, nopt grows considerably faster (
, not shown). However, with respect to memory capacity
, the serial model is superior to the model with complex transitions. More specifically, the complex model's capacity decreases with increasing network size and connectivity, whereas the serial model exhibits a level of capacity that is virtually independent of both network parameters.
|
| Discussion |
|---|
|
|
|---|
Synaptic metaplasticity is thought to enhance memory lifetimes via storing the history of past synaptic changes (Abraham and Bear 1996
Model of Online Learning
The mathematical model we use for online learning is motivated by the central idea that old memories are gradually replaced by new memories. Online learning is described by the dynamics of the distribution of synaptic states related to a specific association that is stored in a recurrent network (Amit and Fusi 1994
). This dynamics of the state distribution is a multidimensional linear iterative map. The eigenvalues of the linear map provide the inverse timescales for overwriting of memories. As a criterion to test whether these memories can still be read out, Amit and Fusi (1994)
use a fixed specific value of the signal-to-noise ratio of the subthreshold membrane potential. In contrast, we optimize the neuronal firing threshold
for a given assembly size M and network size N so as to fulfill a signal-detection criterion based on the suprathreshold activity of the target assembly (Leibold and Kempter 2006
).
Scaling Laws of Memory Lifetime
For the 2-state synaptic model, we find the maximal memory lifetime to grow linearly with the total number of synapses in the network. This result seems to contradict the findings by Fusi et al. (2005)
, who showed memory lifetimes P to grow "logarithmically as a function of the number of synapses used to store the memory." Both results are, however, consistent, since Fusi et al. (2005)
assumed that about as many synapses are involved in any single stored memory as there are synapses in the network, whereas we also allow sparse representations that require only a small number (
M2<<N2) of synapses to be involved in a memory trace.
Moreover, already Amit and Fusi (1994)
have pointed out that the logarithmic dependence of P can be broken up by aptly relating the probability q of having a synaptic state change to parameters like network size, coding ratio, or the number of synaptic states: in the case of sparse coding, q is proportional to the probability that a particular synapse is used in an association. If the storage of a new memory requires potentiation of a random set of cmM2 synapses and there are a total of cmN2 synapses in the network, the probability that the one synapse receives a plasticity signal is (M/N)2 = f2.
Models of Metaplasticity
We consider 2 different models of synaptic metaplasticity. As a starting point, we used the model by Fusi et al. (2005)
. As a second model of synaptic metaplasticity, we studied a serial topology of state transitions, which turned out to provide longer memory lifetimes as the original model with cross transitions. A possible disadvantage of the serial topology, however, might arise from the optimal number nopt of meta levels to increase faster with network size N and connectivity cm than nopt in the model with cross transitions (Fig. 5). Additional costs for each meta level thus might favor the latter model.
Symmetric Learning Rules and Attractor-Type Memories
The learning rule discussed in this paper is asymmetric in the sense that synapses between cue and target neurons are strengthened, whereas synapses in the reverse direction are weakened. As a result, the memory traces are sequence-type associations (e.g., Willshaw et al. 1969
; Nadal 1991
; Leibold and Kempter 2006
). The model does not consider pattern completion, as the latter requires symmetric synaptic connections. Though not explicitly shown here, the fundamental conflict between storing plasticity history and reducing memory interference also remains for pattern completion. However, because pattern completion is generally discussed in the light of dynamical attractors (e.g., Hopfield 1982
; Golomb et al. 1990
; Treves and Rolls 1992
), stability of firing patterns requires larger assembly sizes and, hence, more distributed representations as compared with sequence-type memories. We thus expect that for attractor-type memories, synaptic meta levels are even more useful for enhancing memory lifetimes as compared with the presently investigated sequence-type associations.
On Disregarding the Overlap
Our results are based on the assumption that synapses that connect neurons belonging to both the cue and the target assembly do not change when storing a new memory. For small coding ratios f = M/N, this effect is negligible because the fraction of neurons in the overlap between cue and target assembly is small (
f2). For higher coding ratios, this assumption is important, though, and its validity depends on how synapses are changed by triplets and quadruplets of pre- and postsynaptic spikes. Froemke and Dan (2002)
suggest that these synapses are likely to be depressed, which would correspond to a decrease of the target assembly. This reduction of the assembly size M effectively decreases the quality of readout, and thus our results overestimate the memory capacity for large f. We, however, did not take this into account, mainly because the effects of spike triplets and quadruplets on synaptic changes are still not completely described and, moreover, the most important part of our results is observed for low coding ratios.
Application to Hippocampal CA3 Network
Sparse coding, though beneficial for high storage capacities, may not always be a feasible mode of operation. Constraints that limit the degree of sparseness may favor complex synaptic cascades if the network is large enough, for example, N
104 in Figure 5A. As an example, we calculate the memory lifetimes for parameters corresponding to the hippocampal CA3 region of rats (N
250 000, cm = 0.05, not shown in figures), where assemblies have been estimated to contain few thousands of neurons (Csicsvari et al. 2000
). In this system, a sparser code could be prohibited by requiring dynamical stability of the replay of a sequence of activity patterns (Lee and Wilson 2002
; Leibold and Kempter 2006
). The evaluation of both metaplasticity models in the CA3-like parameter regime yields a maximal lifetime of about Pmax
7 000 at a number nopt = 2 of synaptic meta levels for the model with complex metaplasticity and a lifetime Pmax
13 500 at nopt = 3 for the serial topology. We thus conclude that few meta levels could increase memory longevity in the hippocampus. For a 2-state synaptic model (n = 1) to be optimal for a CA3-type regime, the required representation would have to be sparser than reported, that is, assemblies should contain only few hundreds of neurons.
Population Sparseness
The level f of sparseness as referred to in this paper is generally termed population sparseness (Olshausen and Field 2004
). This quantity is hard to assess experimentally because it requires to identify and measure a large number of nonactive neurons. Most experiments therefore address the temporal sparseness derived from the firing rate distributions of single neurons (Rolls and Tovee 1995
).
Experimental estimates on population sparseness are few. In the hippocampus, one finds f
10–2 (Csicsvari et al. 2000
), whereas the barrel cortex (Brecht and Sakmann 2002
) and the visual cortex (Weliky et al. 2003
) provide coding ratios with large values of f
0.5. For networks with such highly distributed representations, our framework predicts that maximization of memory longevity requires a larger number of synaptic meta levels than expected in the hippocampus.
If Connectivity Depends on Network Size
Network size N and morphological connectivity cm were assumed to be independent variables. As a result, the optimal assembly size Mopt is independent of N (eq. 1). However, if one assumes a constant number cm N of synapses per neuron, the connectivity cm decreases with network size like 1/N, and therefore Mopt is proportional to N. In this case, the optimal coding ratio fopt = Mopt/N and the maximal memory lifetime Pmax are independent of N (eq. 2). That is to say, the memory performance of the network for constant cmN is determined by the number of synapses a neuron can support, that is, the "size" of single neurons rather than the network size (Leibold and Kempter 2006
).
Contributions of Noise
The analytical results presented here are based on a mean-field approach and an evaluation of signal-to-noise ratios, with an inherent source of noise owing to given random morphological connectivities; see Appendix. We neglect several additional sources of noise that may be present in biology. One might discern between external and internal noise contributions. Possible external sources of noise are fluctuations of the neuronal firing threshold, errors while activating a cue pattern, or variations of assembly sizes. Other noise sources can be considered as internal, such as variations of the synaptic state distributions between different cells of a target assembly. Independent of their nature, these additional noise sources will always increase the variance of the postsynaptic depolarization; see Figure 1A and equation (13) in the Appendix. As a consequence, memory lifetime, on average, will decrease, although some specific associations may even have an enhanced lifetime. For example, if we consider a variability of the assembly sizes M, an association between 2 assemblies that occasionally are larger than average will remain stored longer than it would be in the case of a network with all assemblies having identical size. However, on average, the lifetime would be reduced because associations with fewer synapses are forgotten faster and, moreover, an association between larger-than-average assemblies overwrites more-than-average synapses required by earlier memory traces.
To conclude, our mean-field results provide an upper bound of memory lifetimes. This upper bound is a good approximation if there is little additional noise. Internal noise is small specifically for the case of large network size N and large assembly sizes M.
Alternative Roles for Metaplasticity
Though it may seem elementary that increasing the complexity of synaptic state transitions prolongs memory lifetimes, our results clearly demonstrate that this is not the case in general. For sparsely encoded memories, increasing the level of metaplasticity can even be detrimental. Metaplasticity prolongs memory longevity only if the neuronal encoding is highly distributed. One thus might speculate that besides the prolongation of memory lifetime, metaplasticity could also serve a different functional purpose. For example, metaplasticity may provide a substrate to evaluate memories. More important memory traces can be reflected through meta levels with lower transition probabilities as compared with less important memories. Evaluation may be reward-based, repetition-based (Sajikumar and Frey 2004
), or context-based. A functional understanding of the design of synaptic plasticity will hence also be closely related to specific behavioral contexts.
| Appendix |
|---|
|
|
|---|
We consider a recurrent network of N randomly coupled McCulloch–Pitts neurons (McCulloch and Pitts 1943
. The potential h is determined by the synaptic inputs arising from the network activity in the previous time step.
We assume that each synapse can exist in one of 2n discrete states. In the case n = 1, we have 2-state synapses: a synapse is either silent (state 1) and has zero weight, w1 = 0, and therefore no effect on the postsynaptic neuron, or it is activated (state 2) and may increase the postsynaptic depolarization h by weight w2 = 1. In general, the state-specific weights are described by w = (w1, ..., w2n)T in which 0
w
1 is the synaptic weight assigned to state 
{1,...,2n}.
We define an assembly as a group of M randomly selected neurons. The collective activation of this specific set of neurons is thought to represent some particular external event. The probability that a randomly selected neuron in a network of N cells belongs to a specific assembly of size M is called coding ratio f = M/N. A sparse representation of memories is then reflected by f<<1.
An association is a link between a random pair of externally predefined assemblies such that synchronous firing of the neurons forming a cue assembly activates a sufficiently large portion of the neurons in a target assembly.
Because we consider a recurrent network, fM neurons on average belong to both the cue and the target assembly. These fM neurons are also referred to as overlap between the 2 assemblies. For small f, the overlap is negligible and framework becomes essentially a feed-forward network.
The set of synapses contributing to one particular association are described by the state distribution z=(z1,...,z2n)T
[0,1]2n (Amit and Fusi 1994
), which determines the occupancies of the 2n states and is normalized to (1,...,1)z=
=12nz
=1.
A.1 Dynamics of the State Distribution
During learning, synapses change states. The storage of a new association requires that many of the synapses from cue to target neurons undergo LTP. Synapses in the reverse direction, from target to cue, preferentially experience LTD.
We assume that synapses stay unaltered if they connect neurons that belong to both the cue and the target assembly, that is, synapses that are related to the fM neurons in the overlap (see above). As a result, while learning a new association, the number of synapses that receive an LTP stimulus equals cm[M(1 – f)]2, owing to the M(1 – f) cells in the cue and target assemblies that are not in the overlap. In the same way, the identical number cm[M(1 – f)]2 of synapses from target to cue neurons are exposed to an LTD stimulus. We note that this exclusion of the overlap synapses is not equivalent to considering a feed-forward network with assembly size M(1 – f) because the overlap synapses also contribute to both the mean and the variance of postsynaptic depolarization.
After being exposed to an LTP stimulus, a synapse may change its state from
to
' >
. For an LTD stimulus, synapses may switch from
to
' <
. The probabilities that these state changes actually occur are denoted as q
'
. The coefficients q
'
constitute the nondiagonal elements of the plasticity matrix Q. Its diagonal elements q
=–
'
q
'
are constructed such that Q has vanishing column sums, that is, (1,...,1) Q=0, which preserves the normalization of z (see end of Appendix A.1).
The fraction of synapses that connect the disjoint subsets of cells in a given pair of cue and target assembly equals f2(1 – f)2. Storing another association, we thus find the probability f2(1 – f)2q
'
that an "arbitrary" synapse in state
changes its state to
'
. Similarly, the probability of an arbitrary synapse to remain in state
amounts to 1 + f2(1 – f)2q
. For a given plasticity matrix Q, online learning of random associations can then be described by a linear iterated map (Amit and Fusi 1994
). Storing of one further random association maps the state distribution z(t) of an association at time t to the distribution:
|
| (5) |
|
| (6) |
The fixed point
of the dynamics equation (5) is defined via
. If a learning rule is such that
, we also say that this learning rule is "balanced." Otherwise, if
, we call the learning rule unbalanced.
To fully specify the dynamics in equation (6), we have to find an expression for z(0) immediately after imprinting a particular association. The state distribution z(0) can be expressed as a superposition of (1 – f)2 times a potentiated state zLTP, and 1 – (1 – f)2 times the equilibrium state
, that is,
|
| (7) |
|
| (8) |
0, the initial state z(0) equals zLTP, and in the limit f
1, we have z(0) = |
| (9) |
As already announced, the dynamics of z preserves the normalization (1, ..., 1) z = 1 because of the vanishing column sums of Q. This can be shown from equation (5) because
|
|
A.2 Mean Membrane Depolarization and Variance
Given a state distribution z, we can calculate the first and second moment of the postsynaptic potential h, which are needed below to calculate the readout quality of the target assembly. If the whole group of cue neurons fires simultaneously, the mean membrane depolarization
of a target cell amounts to
|
| (10) |
|
| (11) |
|
| (12) |
In analogy to equation (10), we also find an expression for the variance of the membrane potential h,
|
| (13) |
{0, 1}), this expression simplifies to In what follows, we will assume the assembly size M to be large, that is, only groups of M >> 1 simultaneously active neurons are considered to carry meaningful information. As a result, the distribution of h can be approximated to be Gaussian and, hence, the first 2 moments are sufficient in the sense of this approximation.
A.3 Example: 2-State Synapses
The case of 2-state synapses (n = 1) allows an explicit illustration of the dynamics of the state distribution z = (z1, z2)T. There, the fraction of silent synapses is denoted by z1 and the fraction of activated synapses supporting an association is z2 = 1 – z1. For a weight vector w = (0, 1)T, we find the mean membrane depolarization to be
= cmMz2 and the variance to equal
.
Assuming that a new association activates all possible cue-to-target synapses and depresses all possible target-to-cue connections, we have a plasticity matrix
and a potentiated state zLTP = (0, 1)T. Then, the equilibrium state equals
and the state excess is given by
. Together with equation (7), we find an initial state distribution:
|
|
z = –2
z, we then derive the dynamics of the activated synapses,
|
| (14) |
= 1/|ln[1 – 2f2(1 – f)2]| The fraction z2 decays with a time constant
= 1/ln[1–2f2(1–f)2] toward an equilibrium value of
= 1/(2f2). A.4 Readout Criterion
The number P of associations that can be stored in a network is determined via signal-detection theory as described in Leibold and Kempter (2006)
. There, an activity pattern is said to be read out if the fraction p1 of correctly activated neurons (hits) exceeds the fraction p0 of incorrectly activated neurons (false alarms) by some constant detection threshold
> 0. The maximum lifetime P of a memory is then derived from the condition
= p1(P) – p0.
Assuming Gaussian statistics, the fraction of false alarms equals
|
|
|
| (15) |
and the equilibrium potential |
|
|
| (16) |
. Hence, the detection criterion amounts to
|
| (17) |
and the time t an association remains in memory. Thus (for a subset of thresholds), we can numerically determine t as a function F of the firing threshold
. One then defines the maximum number P of storable associations via
|
| (18) |
All results are derived with a readout quality of
= 0.7. It has been shown in Leibold and Kempter (2006)
that different values for
do not change the scaling laws of memory capacity.
A.5 Limit of Low Coding Ratios
For low coding ratios f << 1, we can find explicit formulas to approximate the memory lifetime P. We therefore expand equation (11) to the lowest order in f2. Defining L to be the lowest exponent t
L, at which w · Qt
z
0, this expansion yields
|
| (19) |
{0, 1}), we can combine equations (15) and (16) through elimination of
and solve the resulting quadratic equation for
h
(P)–|
| (20) |
|
|
and the readout quality
. In what follows, we assume K to optimized with respect to
.
We now use Stirling's formula to approximate the binomial coefficient in equation (19) for P >> L,
. Then, together with equation (20) and f
0, we obtain the following expression for the maximum number P of stored associations
|
| (21) |
|
| (22) |
|
| (23) |
A.5.1 Two-State Synapses
In the case of 2-state synapses (n = 1), we have w ·
z =
and w · Q
z = –1 with L = 1. From equation (22), we then find an optimal coding ratio
|
| (24) |
The optimal assembly size for 2-state synapses derived in the Results is equivalent to equation (24) in the sense of the linear approximation e–Pf2 = 1–Pf2 of the exponential function used in equation (19). More specifically, the prefactor
approximates the factor 2e1/2 in equation (1).
From equation (24), we derive the maximal number of associations as
|
| (25) |
A.5.2 Synapses with Complex Metaplasticity
The states in the metaplasticity model by Fusi et al. (2005)
are enumerated such that
= 1 is the most depressed state (bottom left in Fig. 2A). The other states are counted clockwise such that
= 2n denotes the most potentiated state (bottom right in Fig. 2A). Then, the plasticity matrix for n = 4 reads
![]() | (26) |
|
|
|
| (27) |
z and w · Q
z. With the above formulas and the weight vector w = (0, ..., 0, 1, ..., 1), we obtain
|
| (28) |
|
| (29) |
|
| (30) |
A.5.3 Synapses with Serial State Transitions
Another model in which the state transitions connect the synaptic states one after the other is depicted in Figure 3A. We again enumerate the states clockwise such that
= 1 corresponds to the most depressed state (bottom left) and
= 2n corresponds to the most potentiated state w2n = 1 (bottom right). Then, the plasticity matrix (for n = 3) reads
![]() | (31) |
|
|
|
| (32) |
![]() | (33) |
|
| (34) |
| Acknowledgments |
|---|
We thank Stefano Fusi for discussions and comments on the manuscript and Walter Senn for discussions. We are also indebted to Tim Gollisch and Roland Schaette for valuable suggestions and careful reading. This research was supported by the Deutsche Forschungsgemeinschaft (Emmy Noether Programm: Ke 788/1-3, SFB 618) and the Bundesministerium für Bildung und Forschung (Bernstein Center for Computational Neuroscience Berlin, 01GQ0410). Conflict of Interest: None declared.
| References |
|---|
|
|
|---|
Abraham WC, Bear MF. Metaplasticity: the plasticity of synaptic plasticity. Trends Neurosci (1996) 19:126–130.[CrossRef][Web of Science][Medline]
Amit DJ, Fusi S. Learning in neural networks with material synapses. Neural Comput (1994) 6:957–982.[CrossRef][Web of Science]
Brecht M, Sakmann B. Dynamic representation of whisker deflection by synaptic potentials in spiny stellate and pyramidal cells in the barrels and septa of layer 4 rat somatosensory cortex. J Physiol (2002) 543:49–70.
Brunel N, Hakim V, Isope P, Nadal J-P, Barbour B. Optimal information storage and the distribution of synaptic weights: perceptron versus Purkinje cell. Neuron (2004) 43:745–757.[Web of Science][Medline]
Csicsvari J, Hirase H, Mamiya A, Buzsaki G. Ensemble patterns of hippocampal CA3-CA1 neurons during sharp-wave associated population events. Neuron (2000) 28:585–594.[CrossRef][Web of Science][Medline]
Froemke RC, Dan Y. Spike-timing-dependent plasticity induced by natural spike trains. Nature (2002) 416:433–438.[CrossRef][Medline]
Fusi S, Drew PJ, Abbott LF. Cascade models of synaptically stored memories. Neuron (2005) 45:599–611.[CrossRef][Web of Science][Medline]
Fusi S, Abbott LF. Limits on the memory storage capacity of bounded synapses. Nat Neurosci (2007) 10:485–493.[CrossRef][Web of Science][Medline]
Gerstner W, Kempter R, van Hemmen JL, Wagner H. A neuronal learning rule for sub-millisecond temporal coding. Nature (1996) 383:76–78.[CrossRef][Medline]
Golomb D, Rubin N, Sompolinsky H. Willshaw model: associative memory with sparse coding and low firing rates. Phys Rev A (1990) 41:1843–1854.[CrossRef][Medline]
Hebb DO. The organization of behavior (1949) New York (NY): Wiley.
Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA (1982) 79:2554–2558.
Isaac JTR, Nicoll RA, Malenka RC. Evidence for silent synapses: implications for the expression of LTP. Neuron (1995) 15:427–434.[CrossRef][Web of Science][Medline]
Kempter R, Gerstner W, van Hemmen JL. Hebbian learning and spiking neurons. Phys Rev E (1999) 59:4498–4514.[CrossRef]
Kullmann DM. Silent synapses: what are they telling us about long-term potentiation? Philos Trans R Soc Lond B Biol Sci (2003) 358:727–733.
Lee AK, Wilson MA. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron (2002) 36:1183–1194.[CrossRef][Web of Science][Medline]
Leibold C, Kempter R. Memory capacity for sequences in a recurrent network with biological constraints. Neural Comput (2006) 18:904–941.[CrossRef][Web of Science][Medline]
Liao D, Hessler NA, Malinow R. Activation of postsynaptically silent synapses during pairing-induced LTP in CA1 region of hippocampal slice. Nature (1995) 375:400–404.[CrossRef][Medline]
Linsker R. From basic network principles to neural architecture: emergence of orientation columns. Proc Natl Acad Sci USA (1986) 83:8779–8783.
Lüscher C, Nicoll RA, Malenka RC, Muller D. Synaptic plasticity and dynamic modulation of the postsynaptic membrane. Nat Neurosci (2000) 3:545–550.[CrossRef][Web of Science][Medline]
Lüscher C, Frerking M. Restless AMPA receptors: implications for synaptic transmission and plasticity. Trends Neurosci (2001) 24:665–670.[CrossRef][Web of Science][Medline]
McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biol (1943) 5:115–133.
Montgomery JM, Pavlidis P, Madison DV. Pair recordings reveal all-silent synaptic connections and the postsynaptic expression of long-term potentiation. Neuron (2001) 29:691–701.[CrossRef][Web of Science][Medline]
Montgomery JM, Madison DV. State-dependent heterogeneity in synaptic depression between pyramidal cell pairs. Neuron (2002) 33:765–777.[CrossRef][Web of Science][Medline]
Montgomery JM, Madison DV. Discrete synaptic states define a major mechanism of synapse plasticity. Trends Neurosci (2004) 27:744–750.[CrossRef][Web of Science][Medline]
Nadal J-P. Associative memory: on the (puzzling) sparse coding limit. J Phys A Math Gen (1991) 24:1093–1101.[CrossRef]
O'Connor DH, Wittenberg GM, Wang SS-H. Graded bidirectional synaptic plasticity is composed of switch-like unitary events. Proc Natl Acad Sci USA (2005) 102:9679–9684.
Olshausen BA, Field DJ. Sparse coding of sensory inputs. Curr Opin Neurobiol (2004) 14:481–487.[CrossRef][Web of Science][Medline]
Petersen CCH, Malenka RC, Nicoll RA, Hopfield JJ. All-or-none potentiation at CA3-CA1 synapses. Proc Natl Acad Sci USA (1998) 95:4732–4737.
Rolls ET, Tovee MJ. Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex. J Neurophysiol (1995) 73:713–726.
Sajikumar S, Frey JU. Late-associativity, synaptic tagging, and the role of dopamine during LTP and LTD. Neurobiol Learn Mem (2004) 82:12–25.[CrossRef][Web of Science][Medline]
Song S, Miller KD, Abbott LF. Competitive Hebbian learning through spike-timing-dependent synaptic plasticity. Nat Neurosci (2000) 3:919–926.[CrossRef][Web of Science][Medline]
Treves A, Rolls ET. Computational constraints suggest the need for two distinct input systems to the hippocampal CA3 network. Hippocampus (1992) 2:189–200.[CrossRef][Web of Science][Medline]
Tsodyks MV, Feigel'man MV. The enhanced storage capacity in neural networks with low activity level. Europhys Lett (1988) 6:101–105.[CrossRef][Web of Science]
Weliky M, Fiser J, Hunt RH, Wagner DN. Coding of natural scenes in primary visual cortex. Neuron (2003) 37:703–718.[CrossRef][Web of Science][Medline]
Willshaw DJ, Buneman OP, Longuet-Higgins HC. Non-holographic associative memory. Nature (1969) 222:960–962.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




5) synaptic meta levels. In general, Pmax is more robust against LTD bias as compared with LTP bias. In an LTD-prone regime, memory lifetimes can even be slightly prolonged. Further parameters were cm = 0.1, N = 106.


