Cerebral Cortex Advance Access originally published online on November 24, 2004
Cerebral Cortex 2005 15(8):1103-1112; doi:10.1093/cercor/bhh209
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© Oxford University Press 2004; all rights reserved
Population Dynamics of Face-responsive Neurons in the Inferior Temporal Cortex
1 PRESTO, Japan Science and Technology Agency, Saitama 351-0198, Japan, 2 RIKEN Brain Science Institute, Saitama 351-0198, Japan, 3 National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki 305-8568, Japan, 4 Kawato Dynamic Brain Project, ERATO, JST, Kyoto 619-0288, Japan, 5 Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8561, Japan and 6 Graduate School of Medicine, Kyoto University, Kyoto 606-8501, Japan
Address correspondence to Narihisa Matsumoto, Systems Neuroscience Group, Neuroscience Research Institute, AIST, Tsukuba Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan. Email: xmatumo{at}ni.aist.go.jp.
| Abstract |
|---|
|
|
|---|
Neurons in the inferior temporal (IT) cortex of monkeys respond selectively to complex visual stimuli, such as faces. Single neurons in the IT cortex encode different kinds of information about visual stimuli in their temporal firing patterns. To understand the temporal aspects of the information encoded at a population level in the IT cortex, we applied principal component analysis (PCA) to the responses of a population of neurons. The responses of each neuron were recorded while visual stimuli that consisted of geometric shapes and faces of humans and monkeys were presented. We found that global categorization, i.e. human faces versus monkey faces versus shapes, occurred in the earlier part of the population response, and that fine categorization occurred within each member of the global category in the later part of the population response. A cluster analysis, a mixture of Gaussians analysis, confirmed that the clusters in the earlier part of the responses represented the global category. Moreover, the clusters in the earlier part separated into sub-clusters corresponding to either human identity or monkey expression in the later part of the responses, and the global categorization was maintained even after the appearance of the sub-clusters. The results suggest that a hierarchical relationship of the test stimuli is represented temporally by the population response of IT neurons.
Key Words: hierarchical relationship mixture of Gaussians analysis principal component analysis variational Bayes algorithm
| Introduction |
|---|
|
|
|---|
A substantial number of neurons in the inferior temporal (IT) cortex respond to faces or complex objects (Bruce et al., 1981
Here, to understand the temporal aspects of information coding at a population level in the IT cortex, we analyzed the population activity across a number of individually recorded neurons using principal component analysis (PCA) (Jollife, 1986
), which is similar to MDS. Earlier observations of Sugase et al. (1999)
suggested that global categorization occurs before fine categorization. In this study, we investigated how complex visual stimuli are represented along the time axis with respect to the responses of the neuronal population.
We addressed three points that remained unsolved in our previous study (Sugase et al., 1999
). Previously, we described one type of neuron that encoded information about both global and fine categorizations in its responses. To examine information coding at a population level, the present study analyzed the responses of several types of neurons, including neurons that encoded both global and fine information, and also neurons that encoded only global information or only fine information. Second, to evaluate whether our a priori classification of the stimuli, i.e. global and fine categorization (Sugase et al., 1999
), was appropriate, we used a cluster analysis, a mixture of Gaussians analysis. Using the cluster analysis, we were able to classify the population activity vectors for individual stimuli without an arbitrary categorization of the stimulus. We assessed our a priori classification by comparing the clusters produced in the cluster analysis. Third, we were especially interested in whether global categorization was retained after the occurrence of fine categorization within each member of the global category, i.e. whether the test stimuli were represented hierarchically along the time axis. Preliminary results have been presented in abstract form (Matsumoto et al., 2001
).
| Methods |
|---|
|
|
|---|
Neuronal data were collected from two macaque monkeys (Macaca fuscata). All the details of the experimental procedures are described in Sugase et al. (1999)
For the information-theoretic analysis, we used the method described in (Sugase et al., 1999
). Briefly, information about the test stimuli was divided into one global (human faces versus monkey faces versus shapes) category and six fine (identity of the human faces, expression on human faces, identity of the monkey faces, expression on the monkey faces, color of the shapes, and form of shapes) categories. Each predictable piece of information associated with the occurrence of a neuronal response (I(S; R)) was quantified as the decrease in entropy of the stimulus occurrence (H(S)):
![]() | (1) |
Principal Component Analysis
For the population analysis, we calculated a population activity vector for each stimulus. The procedure used to calculate the population activity vectors is summarized in Figure 1. A spike density function was obtained by averaging the spike counts between time t (ms) and t + 1 over the number of trials, and it was smoothed using a Gaussian filter with a variance of 10 ms. The population activity vector vi for test stimulus i consisted of the mean firing rates of 45 neurons that were recorded individually. The mean firing rates were obtained by averaging the spike density function within a 50 ms time window. Each population activity vector had 45 dimensions. Within the 50 ms time window, there were 38 population activity vectors for the 38 test stimuli. The start time of the time window was incremented by 1 ms from 0 ms (at the beginning of the presentation of the test stimuli) to 300 ms. This shift enabled observation of the temporal aspects of the neuronal population.
|
Principal component analysis (PCA) is a dimension-reduction method that rearranges data in a high-dimensional space into a lower-dimensional space while preserving as much of the information in the high-dimensional data as possible. PCA was applied to the 38 population activity vectors in each time window. The greatest variance of the population responses was represented in the first principal component and the second greatest variance was represented in the second principal component.
Mixture of Gaussians Analysis
We used a mixture of Gaussians analysis to cluster the population activity vectors. We assumed that the 38 population activity vectors v = { v1, v2,..., v38} were generated from 45-dimensional Gaussian distributions, i.e. a mixture of Gaussians. Variational Bayes (VB) algorithm (Attias, 1999
; Ghahramani and Beal, 2000
) was used to estimate the parameters of the mixture of Gaussians, i.e. the means, variances, mixing ratios and number of the 45-dimensional Gaussian distributions. We estimated the number of Gaussians corresponding to the number of clusters from the free energy, which indicates the distance between the estimated mixture of Gaussians and the most appropriate mixture of Gaussians (Attias, 1999
; Ghahramani and Beal, 2000
). As the free energy increases, the estimated mixture of Gaussians approaches the most appropriate one (Attias, 1999
; Ghahramani and Beal, 2000
). We set the number of Gaussians from 1 to 10 and calculated the free energy 20 times for each number of Gaussians. Then, we examined the parameters and the number of Gaussians at which the free energy was the maximum. When the free energy is maximal, the members of each cluster are also determined. For example, let us assume that there are vectors and two clusters (A and B). The free energy is calculated for two cases: when one of the vectors belongs to cluster A and when the same vector belongs to cluster B. If the value of the free energy is larger when the vector belongs to cluster A than to cluster B, the vector is assigned as a member of cluster A. Similarly, the members of each cluster are determined.
| Results |
|---|
|
|
|---|
Results of PCA
We analyzed the responses of 45 neurons at the population level (see Methods). The responses of the 45 neurons were recorded individually. First, we classified the 45 neurons using the information-theoretic analysis that was used in Sugase et al. (1999)
. For the responses of each neuron, we calculated the information transmission rate for one global (human faces versus monkey faces versus shapes) category and six fine (human identity, human expression, monkey identity, monkey expression, shape color, and shape form) categories. We found that 36/45 neurons encoded both global and fine information, 7/45 neurons encoded only global information and the remaining 2/45 encoded only fine information.
As we reported previously, information on the global category was transmitted before information on the fine category, with an average difference in latency of 51 ms, although there was substantial variation across the neurons (SD = 39 ms, n = 32; Sugase et al., 1999
). The time of the peak information transmission rate for both the global and fine categories also varied cell-by-cell, and was 152 ± 57 ms (mean ± SD) for global information and 179 ± 49 ms for fine information. One reason for the variation among the neurons was that each neuron had a different temporal firing pattern. In the example shown in Figure 2a, some neurons had both initial transient and later sustained responses, whereas others showed only an initial transient response (Fig. 2b) or a later sustained response (Fig. 2c). The peak times for global information were 117, 109 and 213 ms after the stimulus onset for the neurons in Figure 2a,b,c, respectively (Fig. 2, arrows in the left panels). The peak times for fine information also varied among the neurons, and were 205, 149 and 245 for the respective neurons (Fig. 2, arrows in the right panels). The peak time for the global information preceded the peak time for the fine information. The intervals between these two peak times varied among the neurons, and were 88, 40 and 32 ms for the respective neurons.
|
Having the cell-by-cell variation for the intervals between the peak times of these two information measures, we decided to perform a population analysis to see how the IT neurons represented the test stimuli along the time axis. For the population analysis, we calculated 38 population activity vectors consisting of the mean firing rates of the 45 neurons for the 38 test stimuli within a 50 ms time window, moving in 1 ms steps. We applied PCA to the 38 population activity vectors in each time window. Consequently, the 38 population activity vectors in the 45-dimensional space projected onto 38 vectors in the two-dimensional space. To determine the time windows in which global or fine categorization occurred, we calculated the distances between the population activity vectors. The center coordinates for vectors that belonged to either global or fine categories were determined by averaging the coordinates for the vectors. For the global category, three distances were measured, i.e. the distance between the center of the human face vectors and the center of the monkey face vectors, the distance between the center of the monkey face vectors and the center of the shape vectors, and the distance between the center of the human face vectors and the center of the shape vectors. The sum of the three distances was the maximum in the [90 ms, 140 ms] time window. For the fine category, the distances between the centers of the vectors were measured and summed within each member of the global category, i.e. human identity, monkey expression and shape form. The sum of the distances for human identity, monkey expression, and shape form was the maximum in the [140 ms, 190 ms] window. Therefore, the [90 ms, 140 ms] window was regarded as the time window when global categorization occurred, and the [140 ms, 190 ms] window was regarded as the time window for fine categorization.
To examine whether only a few neurons determine the distribution of the population activity vectors, we calculated the eigenvectors that determined both the first and second principal components. The first principal component was determined by the eigenvector shown in Figure 3a. The second principal component was determined by the eigenvector shown in Figure 3b. A neuron with a higher value of the element contributes more to setting a principal component. From Figure 3a,b, it is clear that the distribution of the values is not biased toward a small number of neurons, indicating that more than a small number of neurons contribute to setting both the first and second principal components. We also checked the eigenvalues to see how each principal component contributed to the PCA space (Fig. 3c). The eigenvalue indicates how much of the variance in the data is represented along each axis. The eigenvalue of the first principal component was largest, indicating that the first principal component represented most of the variance in the population response.
|
Figure 4 shows the distributions of the 38 population activity vectors in the two-dimensional space in the [90 ms, 140 ms] and [140 ms, 190 ms] time windows together with the [0 ms, 50 ms] window, which was the initial condition of the population vectors. The contribution ratio was 34.2% in the [0 ms, 50 ms] time window, 67.7% in the [90 ms, 140 ms] window and 67.1% in the [140 ms, 190 ms] window. The ratios in the [90 ms, 140 ms] and [140 ms, 190 ms] time windows were high, given the reduction from 45 dimensions to only two dimensions, suggesting that the information encoded in the [90 ms, 140 ms] and [140 ms, 190 ms] windows in the two-dimensional space preserved the information in the 45-dimensional space well. In the [0 ms, 50 ms] time window, all the distributions overlapped. In the [90 ms, 140 ms] window, the distribution pattern suggested that global categorization, i.e. human faces versus monkey faces versus shapes, occurred during this time period (Fig. 4a). In the [140 ms, 190 ms] window, the distances between the distributions of each member of the global category were maintained (Fig. 4a). In addition, the human identity distributions (Fig. 4b) and the monkey expression distributions (Fig. 4c) were separated. But, the human expression distribution (Fig. 4c) and monkey identity distribution (Fig. 4b) still overlapped. The distribution pattern suggested that fine categorization, i.e. human identity or monkey expression, occurs during the [140 ms, 190 ms] window, while global categorization was maintained. These results suggest that the hierarchical relationship of the test stimuli is represented by the dynamics of neuronal responses at the population level.
|
Results from the Mixture of Gaussians Analysis
PCA separated the distributions of human, monkey, and shape in the [90 ms, 140 ms] time window. Therefore, global categorization, i.e. human versus monkey versus shape, occurred during this period. In the [140 ms, 190 ms] window, the individual distributions of human identity and monkey expression were separated. Therefore, fine categorization, i.e. human identity or monkey expression, occurred during this period. We re-plotted the PCA space in which the ellipses now represent the distributions of human identity, monkey expression, or shape form in Figure 5a. To investigate whether both the global and fine categorizations approximated what the neuronal responses represented, we applied a cluster analysis, a mixture of Gaussians analysis, to the 45-dimensional population activity vectors in each time window (see Methods).
|
The clusters obtained using the mixture of Gaussians analysis in the [0 ms, 50 ms], [90 ms, 140 ms], and [140 ms, 190 ms] windows are shown as circles in Figure 5b. There were 3, 6 and 7 clusters in the [0 ms, 50 ms], [90 ms, 140 ms] and [140 ms, 190 ms] windows, respectively. The members of each cluster in the [90 ms, 140 ms] and [140 ms, 190 ms] windows are shown in Figure 6a,b. The [90 ms, 140 ms] window contained clusters corresponding to human faces, monkey faces and shapes (Fig. 6a). In addition, six monkey faces with a open mouth were separated from the other monkey faces. In the [140 ms, 190 ms] window, some clusters that were found in the [90 ms, 140 ms] window were separated into sub-clusters (Fig. 6b). The human face cluster in the [90 ms, 140 ms] window was separated into two sub-clusters. The two monkey face clusters in the [90 ms, 140 ms] window were separated into three sub-clusters (Fig. 6b).
|
We evaluated the clusters obtained in the mixture of Gaussians analysis to investigate how precisely the clusters categorized the test stimuli. The mutual information I(A; B) between the categories (A) of the test stimuli and the clusters (B) was calculated as
![]() | (2) |
|
The global category was human faces versus monkey faces versus shapes. The mutual information between each cluster and the global category approached its maximum value in the [90 ms, 140 ms] time window, and remained the same in the [140 ms, 190 ms] window. This suggests that the global category was represented in the [90 ms, 140 ms] time window and was maintained until the [140 ms, 190 ms] window. Regarding the information common to each cluster and the fine categories, the mutual information concerning both human identity and monkey expression was maximal in the [140 ms, 190 ms] time window, suggesting that this window represented the fine categories. There was little mutual information for both human expression and monkey identity. Therefore, when fine categorization occurred, human faces were classified mainly according to identity, and monkey faces were classified mainly according to expression. Mutual information between each cluster and the shape-form category was maximal in both the [90 ms, 140 ms] and [140 ms, 190 ms] windows, suggesting that the categorization of shapes according to their form occurred before the fine categorization of faces. Therefore, categorization occurred from the global category to the fine categories along the time axis, in which the fine categories corresponded to human identity and monkey expression. This implies that the population response of IT neurons represented a hierarchical relationship of the test stimuli temporally.
| Discussion |
|---|
|
|
|---|
To understand the temporal aspects of information encoding at the population level in the IT cortex, we analyzed the population response across 45 individually recorded neurons using PCA and a mixture of Gaussians analysis. Analysis of the individual neurons has showed that the information on global categorization increased
51 ms before the information on fine categorization (Sugase et al., 1999
50 ms before the fine categorization. Using the mixture of Gaussians analysis, we investigated whether both the global and fine categorizations were close approximations of what the neuronal responses represented. The [90 ms, 140 ms] window contained clusters corresponding to global categorization, i.e. human faces versus monkey faces versus shapes. In the [140 ms, 190 ms] window, human faces and monkey faces were separated into sub-clusters corresponding to either the human identity or monkey expression. We also found that the global categorization was maintained even after the sub-clusters appeared. Therefore, a hierarchical relationship of the test stimuli was represented.
For fine categorization, we found that human faces were classified mainly according to identity, rather than to expression, whereas monkey faces were classified mainly by expression. The monkey subjects might have difficulty in discriminating between either different human expressions or different monkey models using our test stimuli. It would be interesting to see behavioral data for monkeys that perform a discrimination task, such as the face identification task used by Eifuku et al. (2004)
, using the same test stimuli, to see whether the monkeys have difficulty in discriminating these two things.
How many faces can be represented in the monkey temporal cortex using this type of coding? For example, in this study, a population of 45 neurons encoded 0.71 bits for human identity. To represent as many as 100 human identities (6.6 bits) might need a population of neurons about nine times larger, i.e.
400 neurons, assuming that each population encodes information about human identity independently. As there are more than 400 neurons in the IT cortex, we believe that the IT neurons have the capacity to represent a much larger number of faces, using hierarchical coding.
We also found that in the [140 ms, 190 ms] window, fine categorization occurred within each member of the global category, while global categorization was maintained. This implies that the population of neurons extracts a hierarchical relationship from among the test stimuli and represents each stage of the hierarchy at a different time. This temporal hierarchical encoding might be useful for memory in the IT cortex. As the number of neurons in the IT cortex is limited, the neurons have to store information efficiently. Storing the information hierarchically along the time axis is one way of ensuring such efficient encoding. For example, when a human face is stored, it would be classified into a human group. The neurons that represent information regarding the human group would have to store only the differences between this face and other people's faces; they do not have to store all the possible relationships between this face and a wide variety of objects throughout the world. This reduces the effort needed to remember a face. Hierarchical encoding might have another benefit. As the amount of information stored in the IT cortex increases, more time is needed to search for a target. If the information is stored hierarchically, less time is needed to search for an object because global information can be used as a tag. For example, when we recognize a person by looking at his/her face, initially there is a search for the human faces category and then there is a search for the face among the human faces in memory. This would take less time than searching for the face directly among the large number of information items that humans habitually store. Therefore, hierarchical encoding would also be important because it enables a rapid search.
The next question is how the dynamics of information representation in the IT cortex are produced. The visual areas earlier than the IT cortex are thought to play a role in processing the more detailed features of a visual stimulus, so global and fine categorization of the test stimuli might not take place in these areas, whereas the global and fine relationship might be detected in the IT cortex. There are several neural network models that can reproduce our IT neuronal responses. As an example, Matsumoto and Okada (2004)
examined whether a neural network within the IT cortex served an important role in forming the dynamics of the neurons. They used an attractor network (Amit, 1989
), and found that the dynamics of their attractor network were qualitatively similar to the responses of the IT neurons recorded by Sugase et al. (1999)
. Another attractor model has been proposed for the hierarchical classification of odors (Ambros-Ingerson et al., 1990
). This model includes feed-forward and feedback connections between two different areas (olfactory bulb and olfactory cortex), and might be applicable to our hierarchical processing of visual stimuli. Some neurons in the IT cortex have prolonged sustained activity that continues after the disappearance of a visual stimulus (Miyashita, 1988
; Miyashita and Chang, 1988
). Experimental evidence has shown that this type of mnemonic signal is triggered by a top-down signal from the prefrontal cortex (Tomita et al., 1999
). Interactions between the IT cortex and prefrontal cortex might also be important in hierarchical representation. Yet another model is a feedforward model that includes both slow and fast pathways, in which global information is processed on a fast pathway and fine information is processed on a slow pathway. Therefore, either an intra- or inter-areal contribution might be important for the hierarchical representation of visual stimuli, or a hierarchical representation might already be observed in the visual cortex that sends its major output to the IT cortex. Interesting further studies into the neural mechanisms underlying the hierarchical representation might involve experimental manipulations such as the disruption of neuronal processing either within the IT cortex or between the IT cortex and other areas, or recording from the cortex that participates in an earlier processing stage along the ventral visual stream.
| Acknowledgments |
|---|
We thank Dr M. Kawato, Dr K. Doya and Dr F.A. Miles for their many valuable comments. We also thank Dr. M. Sato for discussion regarding the variational Bayes algorithm. This work was partially supported by Grant-in-Aid Scientific Research on Priority Areas No. 14084212 and Grant-in-Aid Scientific Research (C) No. 16500093, and it was performed as part of the Advanced and Innovational Research Program in Life Sciences of the Ministry of Education, Culture, Sports, Science and Technology, Japan.
| References |
|---|
|
|
|---|
Ambros-Ingerson J, Granger R, Lynch G (1990) Simulation of paleocortex performs hierarchical clustering. Science 247:13441348.
Amit DJ (1989) Modeling brain function. Cambridge: Cambridge University Press.
Attias H (1999) Inferring parameters and structure of latent variable models by variational bayes. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. San Mateo, CA: Morgan-Kaufmann.
Bruce C, Desimone R, Gross, CG (1981) Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. J Neurophysiol 46:369384.
Eifuku S, De Souza WC, Tamura R, Nishijo H, Ono T (2004) Neuronal correlates of face identification in the monkey anterior temporal cortical areas. J Neurophysiol 91:358371.
Fujita I, Tanaka K, Ito M, Cheng K (1992) Columns for visual features of objects in monkey inferotemporal cortex. Nature 360:343346.[CrossRef][Medline]
Ghahramani Z, Beal MJ (2000) Variational inference for Bayesian mixtures of factor analyzers. In: Advances in neural information processing 12 (Solla SA, Leen TK, Müller K-R, ed.). Cambridge, MA: MIT Press.
Hasselmo ME, Rolls ET, Baylis GC (1989) The role of expression and identity in the face-selective responses of neurons in the temporal visual-cortex of the monkey. Behav Brain Res 32:203218.[Web of Science][Medline]
Hebb DO (1949) The organization of behavior: a neuropsychological theory. New York: Wiley.
Jollife IT (1986) Principal component analysis. New York: Springer-Verlag.
Matsumoto N, Okada M, Doya K, Sugase Y, Yamane S (2001) Dynamics of the face responsive neurons in the temporal cortex. Soc Neurosci Abstr 27:1048.
Matsumoto N, Okada M (2004) Neuronal mechanisms for hierarchical encoding in inferior-temporal cortex. Neurocomputing 5860:873877.[CrossRef]
Miyashita Y (1988) Neuronal correlate of visual associative long-term memory in the primate temporal cortex. Nature 335:817820.[CrossRef][Medline]
Miyashita Y, Chang HS (1988) Neuronal correlate of pictorial short-time memory in the primate cortex. Nature 331:6870.[CrossRef][Medline]
Sugase Y, Yamane S, Ueno S, Kawano K (1999) Global and fine information coded by single neurons in the temporal visual cortex. Nature 400:869873.[CrossRef][Medline]
Tamura H, Tanaka K (2001) Visual response properties of cells in the ventral and dorsal parts of the macaque inferotemporal cortex. Cereb Cortex 11:384399.
Tomita H, Ohbayashi M, Nakahara K, Hasegawa I, Miyashita Y (1999) Top-down signal from prefrontal cortex in executive control of memory retrieval. Nature 401:699703.[CrossRef][Medline]
Young PM, Yamane S (1992) Sparse population coding of faces in the inferotemporal cortex. Science 256:13271331.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Akrami, Y. Liu, A. Treves, and B. Jagadeesh Converging Neuronal Activity in Inferior Temporal Cortex during the Classification of Morphed Stimuli Cereb Cortex, April 1, 2009; 19(4): 760 - 776. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Orban Higher Order Visual Processing in Macaque Extrastriate Cortex Physiol Rev, January 1, 2008; 88(1): 59 - 89. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Li, D. Cui, P. Jiruska, J. E. Fox, X. Yao, and J. G. R. Jefferys Synchronization Measurement of Multiple Neuronal Populations J Neurophysiol, December 1, 2007; 98(6): 3341 - 3348. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Zoccolan, M. Kouh, T. Poggio, and J. J. DiCarlo Trade-Off between Object Selectivity and Tolerance in Monkey Inferotemporal Cortex J. Neurosci., November 7, 2007; 27(45): 12292 - 12307. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











