Cerebral Cortex, Vol. 9, No. 5, 445-458,
July 1999
© 1999 Oxford University Press
Electrophysiological Studies of Human Face Perception. III: Effects of Top-down Processing on Face-specific Potentials
Neuropsychology Laboratory, VA Medical Center, West Haven, CT 06516 and Departments of Neurosurgery and Neurology, Yale University School of Medicine, New Haven, CT 06510, USA
| Abstract |
|---|
|
|
|---|
This is the last in a series of papers dealing with intracranial event-related potential (ERP) correlates of face perception. Here we describe the results of manipulations that may exert top-down influences on face recognition and face-specific ERPs, and the effects of cortical stimulation at face-specific sites. Ventral facespecific N200 was not evoked by affective stimuli; showed little or no habituation; was not affected by the familiarity or unfamiliarity of faces; showed no semantic priming; and was not affected by face-name learning or identification. P290 and N700 were affected by semantic priming and by face-name learning and identification. The early fraction of N700 and face-specific P350 exhibited significant habituation. About half of the AP350 sites exhibited semantic priming, whereas the VP350 and LP350 sites did not. Cortical stimulation evoked a transient inability to name familiar faces or evoked face-related hallucinations at two-thirds of facespecific N200 sites. These results are discussed in relation to human behavioral studies and monkey single-cell recordings. Discussion of results of all three papers concludes that: face-specific N200 reflects the operation of a module specialized for the perception of human faces; ventral and lateral occipitotemporal cortex are composed of a complex mosaic of functionally discrete patches of cortex of variable number, size and location; in ventral cortex there is a posterior-to-anterior trend in the location of patches in the order letter-strings, form, hands, objects, faces and face parts; P290 and N700 at face-specific N200 sites, and face-specific P350, are subject to top-down influences.
| Introduction |
|---|
|
|
|---|
In the psychology of perception and in computer vision the concepts of `bottom-up' and `top-down' processing are commonly used. The distinction is that bottom-up processes are involved in the analysis of the incoming image, while top-down processes originate with stored models and information associated with them (Ullman, 1996
In this paper we describe experiments designed to test the responsiveness of face-specific ERPs to manipulations that are regarded as top-down processes. Such processing engages the subject's prior knowledge, sets up expectancies or contexts, and imposes modifications and constraints on the manner in which neuronal computations are performed on incoming stimuli. We tested the responsiveness of face-specific ERPs to affective stimuli, habituation, familiar and unfamiliar faces, semantic priming, and face-name learning and identification. In some patients the effects of cortical stimulation on recognition of familiar faces were determined. As in the previous papers we will provide a rationale for each experiment in the Results section. A preliminary report of some of this work has appeared (Puce et al., 1997
).
| Materials and Methods |
|---|
|
|
|---|
General methods were described previously (Allison et al., 1999
| Results |
|---|
|
|
|---|
Are Face ERPs Due to Emotional Arousal?
In recordings of face-specific cells in monkeys the question arose whether faces provoke arousal or emotional reactions that might have evoked the increased spike discharge. Several studies found that this was not the case. Cells that responded well to faces responded minimally to various types of aversive auditory and tactile stimuli, and to aversive visual stimuli (e.g. a snake) that would be expected to evoke an emotional response (Perrett et al., 1982
; Desimone et al., 1984
; Leonard et al., 1985
; Brothers and Ring, 1993
). To evaluate this question in humans, patients viewed faces, and erotic (attractive semi-nude males), aversive (violent events) and neutral (landscapes) images. Because most highly aversive images involve human activity, 74% of the aversive stimuli contained faces or bodies (e.g. people jumping out of a burning building), and 90% of the erotic stimuli contained faces, but in both cases faces were secondary to the affect provoked by the images.
There were nine ventral face-specific N200 sites, five in the right and four in the left hemisphere. Representative recordings are shown in Figure 1A,B
. Results for the right and left hemisphere were similar and are combined in Figure 1CE
. The overall ANOVA for N200 amplitude was significant [F(df 3,24) = 24.8, P < 0.0001]. N200 amplitude to faces was significantly larger than to the erotic or aversive images (P < 0.008 in each case), and was significantly larger to erotic than to aversive images (P < 0.001). The overall ANOVA for N700 area under the curve (AUC) was significant (P < 0.05). N700 was larger to faces than to the other categories of stimuli. Thus N200 amplitude and N700 AUC were a function of the probability of faces in the stimulus set, not a function of the emotional valence of the stimuli. With one exception all face-specific N200 sites were recorded in females, hence we could not assess possible sex differences in ERP responsiveness to the erotic and aversive stimuli.
|
Habituation
Habituation, defined as a progressive decrease in response to repeated presentation of the same stimulus, is perhaps the most elementary and ubiquitous form of neuronal and behavioral plasticity (Groves and Thompson, 1970
). Miller et al. (Miller et al., 1991
) showed monkeys repeated presentations of common objects at 2 s intervals and found considerable habituation of most STS/IT cortex cells. Habituation has not been studied in face-specific cells, but Rolls et al. (Rolls et al., 1989
) found that the response of face-specific cells to novel faces tended to decrease over the first few presentations as the face became familiar. To assess habituation of face-specific ERPs we used a design similar to that of Miller et al. (Miller et al., 1991
). Patients viewed the same novel face presented eight times at intervals of 2 s, followed by eight presentations of a new face, and so on for a total of 40 sets of faces. No target stimuli were interspersed among the face stimuli.
There were 28 ventral face-specific N200 sites, 17 in the right and 11 in the left hemisphere. Results for the right and left hemisphere were similar and are combined in Figure 2
. The overall ANOVA for N200 amplitude was significant [F(df 7,189) = 2.68, P < 0.01]. This effect was partly due to the significant (P < 0.003) decrease in N200 amplitude from the first to the second presentation of a face. When trial 2 was removed from analysis the ANOVA was marginally significant (P < 0.04), but there was no progressive decrease in N200 amplitude from the second to the eighth presentation of a face. Thus the evidence for decrement of N200 amplitude was equivocal. The overall ANOVA was not significant for P150 amplitude. The overall ANOVA for P290 amplitude was significant [F(df 7,189) = 6.19, P < 0.0001]. This effect was due mainly to the significant (P < 0.005) increase in P290 amplitude from the first to the second presentation of a face; there was no progressive increase in amplitude from the second to the eighth presentation of a face. These offsetting changes can be seen in a plot of the peak-topeak amplitude of N200 and P290, which shows a small increase in amplitude after the second presentation of a face (Fig. 2
). The overall ANOVA for N200P290 peak-to-peak amplitude was not significant. The overall ANOVA for N700 AUC was not significant across all face-specific N200 sites. However, analysis of the subset of sites that had a face-specific N700 revealed that the overall ANOVA for the early fraction of N700 AUC was significant [F(df 7,56) = 2.48, P < 0.03], and this activity showed considerable decrement (Fig. 2
). The overall ANOVA for the late fraction was not significant. There were only three lateral face-specific N200 sites; no changes in P150, N200, P290 or N700 were seen.
|
There were eight face-specific VP350 and LP350 sites, four in the right and four in the left hemisphere. Results for the right and left hemisphere were similar and are combined in Figure 2
In summary, ventral and lateral N200s exhibited a slight non-progressive response decrement, P290 exhibited a nonprogressive response increment, and P350 and the early fraction of N700 exhibited substantial progressive response decrements in response to repeated presentation of the same face.
Familiar and Unfamiliar Faces
In the steps leading to recognition of a familiar face it is likely that an earlier stage of face perception (`this is a face') is followed by a stage of recognition (`this is Ronald Reagan's face'). Most prosopagnosics recognize that they are viewing a face, and can identify the eyes and other face parts, but the parts do not `add up' to a recognizable face. Thus in the model of Bruce and Young (Bruce and Young, 1986
) an initial `structural encoding' stage is followed by `face recognition unit' and `person identity node' stages. Because face-specific N200 is the first reliable sign of face-specific processing, we surmised that it reflects activity related to the structural encoding stage (Allison et al., 1994a
). If this assumption is correct we would expect that familiarity or unfamiliarity of the face would not affect face-specific N200 amplitude or latency, but might affect later processing at N200 sites (e.g. P290) or at P350 sites. We tested this hypothesis by using a randomized set of 60 faces of famous persons (primarily politicians, movie and television stars) and 60 unfamiliar faces obtained from modeling agency books. The two sets of faces were matched for sex and luminance, and approximately matched for age and attractiveness. In the first run of this experiment human faces were task irrelevant and dog faces were targets.
There were 15 ventral face-specific N200 sites, 7 in the right and 8 in the left hemisphere. Representative recordings are shown in Figure 3A,B
. Results for the right and left hemisphere were similar and are combined in Figure 3C,D
. P150, N200, P290 and N700 showed no significant differences in amplitude, latency or AUC between familiar and unfamiliar faces. Familiar and unfamiliar faces did not evoke significantly different lateral face-specific N200s, or lateral or ventral P350s. There were nine face-specific AP350 sites, all in the right hemisphere. AP350 AUC was not significantly different for familiar and unfamiliar faces.
|
To verify that the familiar faces were recognized, this experiment was run a second time, immediately after the first run. The patient pressed one button to indicate a familiar (famous) face, and another to indicate an unfamiliar face; dog faces were not task relevant. None of the ERPs reviewed above were significantly different to familiar and unfamiliar faces, thus they were unaffected by face familiarity whether the human faces were task irrelevant or task relevant.
Semantic Priming
Semantic priming refers to the fact that recognition of a word or object of a particular category (e.g. animals) is better and faster when preceded by a stimulus of the same category (e.g. cat preceded by dog) than when preceded by a stimulus of a different category (e.g. cat preceded by pencil). Behavioral studies of face priming have been carried out to infer the processes involved in face recognition, or more generally to study implicit memory processes [reviewed by Young and Bruce (Young and Bruce, 1991
), Bruce and Humphreys (Bruce and Humphreys, 1994
) and Schacter and Buckner (Schacter and Buckner, 1998
)]. Semantic priming is defined electrophysiologically as a decrement in ERP amplitude produced by a preceding associatively related stimulus (Nobre et al., 1994
).
To assess the effects of semantic priming on face-specific ERPs, patients viewed pairs of stimuli consisting of the name of a famous person followed by a picture of a famous person. On a random half of the trials, the name matched the picture (e.g. the name Albert Einstein preceded a picture of Albert Einstein's face; Fig. 4
) and thus the face was primed. On the remaining trials, the name did not match the face (e.g. the name Princess Diana preceded a picture of Albert Einstein's face) and thus the face was unprimed. Later testing verified that the famous faces and names were familiar to the patient.
|
There were 25 ventral face-specific N200 sites, 13 in the right and 12 in the left hemisphere. Representative recordings are shown in Figure 4A,B
|
No significant priming of N200, P290 and N700 was seen at lateral face-specific N200 sites, nor was there significant priming of VP350 and LP350. There were eight face-specific AP350 sites, all in the right hemisphere. Representative recordings are shown in Figure 5A,B
To summarize, N200, the early fraction of N700, VP350 and LP350 showed no evidence of semantic priming, but there was priming of P290, the late fraction of N700 and AP350 at some sites.
Face Identification
How do previously unfamiliar faces become familiar? Some template or mental representation of a new face must be stored and compared with later instances of the same face. As noted above, the model of Bruce and Young (Bruce and Young, 1986
) posits a structural encoding stage, followed by activation of face recognition units, which in turn activate the person identity node. The model of Damasio et al. (Damasio et al., 1982
) also proposes a three-stage mechanism beginning with a stage of template formation, followed by a stage of template matching, which provides the link to an activational stage `unlocked' by the matched template. These models have in common an early perceptual stage, indifferent to face familiarity, and later stages that determine familiarity. The simplest prediction would be that N200, which may reflect a structural encoding or template formation stage of face processing, would be insensitive to later identification of a face, whereas later face-specific ERPs might be. A paired-associate experiment consisting of three stages was run to test this prediction: (i) a learning stage in which 10 unfamiliar faces were paired with common names, (ii) a distractor stage in which the patient categorized 10 new faces as male or female by pressing one of two buttons, and (iii) an identification stage in which the subject reviewed the 10 faces learned in the first stage and indicated by button press whether the paired name was correct. A nameface pair had a 50% chance of being the original pairing.
There were 15 face-specific N200 sites, 8 in the right and 7 in the left hemisphere. Representative recordings are shown in Figure 6A,B
. Results for the right and left hemisphere were similar and are combined in Figure 6CE
. There were no significant effects on N200 amplitude and latency or P290 latency. P290 amplitude was significantly larger in the learning condition than in the other two conditions (P < 0.01 in each case). N700 AUC was significantly larger (P < 0.03) in the identification compared to the learning condition. There were too few P350s in this experiment for quantitative analysis. These results suggest that face-specific N200 is not involved in processes related to face learning and identification, whereas later processing at facespecific N200 sites is affected by (or involved in) face learning and identification.
|
Face-specific ERPs were not recorded from the hippocampus in the experiments summarized previously (Allison et al., 1999
|
The results of this and the other ERP experiments are summarized in Table 1
|
Cortical Stimulation
Allison et al. (Allison et al., 1994a
) found that electrical stimulation at ventral face-specific N200 sites produced a temporary inability to name famous or family faces that they had previously identified correctly. We have since carried out cortical stimulation in additional patients. This summary applies to all 12 cases. Two types of tests were carried out during stimulation with 5 s trains of 50 Hz, 0.2 ms duration, 210 mA constantcurrent bipolar pulses. (i) While viewing a white screen with a central fixation point and spatial markers the patient was asked to point to and describe any visual alterations. (ii) Cognitive tests included naming faces of famous individuals or family members that the patient rapidly identified during prior testing; naming common (flash-card) objects rapidly named during prior testing; and reading and completing simple sentences.
The most common perceptual alterations evoked by stimulation were white or colored phosphenes, always seen in the visual field contralateral to stimulation (e.g. Table 2
, A67). Phosphenes often moved from the central toward the peripheral visual field during the 5 s of stimulation, and were typically evoked at sites posterior or medial to face-specific N200 sites. Of the 20 face-specific N200 sites stimulated, stimulation at four sites produced facial hallucinations. One patient had adjacent face-specific N200 sites on the lateral fusiform gyrus (Fig. 8
). Cortical stimulation at these sites produced detailed imagery involving single or multiple faces (Table 2
, A910, A1011, A1113). In another patient, stimulation of a face-specific N200 site evoked imagery of an eye which changed into a right profile view of a face during stimulation. In a third patient, stimulation of a face-specific N200 site evoked the image of a `blinking eye' in the visual field contralateral to stimulation. Stimulation of a face-specific AP350 site evoked the image of a small face with a large eye; during stimulation the eye moved laterally in the contralateral visual field.
|
|
Table 2
Overall, face recognition was tested at 16 ventral face-specific N200 sites; clear deficits were produced at seven sites and possible deficits at three sites. Thus some disruption of face recognition occurred at 63% of face-specific N200 sites tested, generally without an accompanying impairment of object naming or sentence reading. A deficit in face recognition was also seen at the one lateral face-specific N200 site tested. In contrast, definite or possible deficits of face recognition were produced at 30 of 139 (22%) of sites that were not face-specific; of these 30 sites, 11 were adjacent to face-specific sites. Thus stimulation of 19/139 (14%) of sites that were not face specific and not adjacent to face-specific sites produced deficits in face recognition. The difference in percentage of effects on face recognition at face-specific compared to other sites was significant (
2 = 9.65, P < 0.002).
| Discussion |
|---|
|
|
|---|
Is Face-specific N200 Evoked by Emotional Arousal?
The answer to this question is no. Erotic or aversive images evoked appreciable N200s at face-specific sites only to the extent that they also contained faces. However, we did not obtain enough recordings in this task to determine possible effects on P350. In a functional magnetic resonance imaging (fMRI) study Lange et al. (Lange et al., 1998
) assessed activity in visual cortex to pleasant, neutral and unpleasant pictures. They found that pleasant and unpleasant pictures produced more activation of visual cortex, including the fusiform gyrus, than did neutral pictures. Our results suggest that this increased activation is tonic and non-specific, and has little or no effect on transient face-specific activity.
Habituation
N200 amplitude decreased to repeated presentation of faces (Fig. 2
), which could be interpreted as evidence of habituation. P290 amplitude increased in amplitude, which could be interpreted as evidence of sensitization (Groves and Thompson, 1970
). Three considerations argue against these interpretations. First, the changes in N200 and P290 amplitude were not progressive, compared with the progressive decrement seen in behavioral (Groves and Thompson, 1970
) and electrophysiological (Miller et al., 1991
) studies of habituation. Second, the N200 amplitude decrease was coupled with a corresponding P290 amplitude increase. Thus it is necessary to postulate a linked habituation of N200 and sensitization of P290, unlikely given the evidence that habituation and sensitization are independent processes (Groves and Thompson, 1970
). Third, N200 P290 peak-to-peak amplitude did not change systematically with repeated face presentation (Fig. 2
). The most parsimonious explanation of these results is that N200 and P290 amplitude remained unchanged and that a slow positive baseline shift was evoked by the second and later presentations of a face. In any case, the amplitude changes were small and indicate that habituation of N200 and sensitization of P290, if present, were minimal. By contrast, there were large and progressive decreases in the early fraction of N700 and in P350 (Fig. 2
), demonstrating considerable habituation of this activity.
Familiar and Unfamiliar Faces
N200, P290, N700, VP350, LP350 and AP350 recorded at face-specific sites were not significantly different in amplitude, latency or AUC for familiar and unfamiliar faces (Fig. 3
). This was the case whether the faces were task relevant or not. Thus there is no evidence that these neuronal processes are involved in the recognition of familiar faces. A mnemonic or associative role for AP350 would be consistent with the partial overlap of the anterior face area with entorhinal cortex, but this experiment did not provide evidence that AP350 is related to the identification of familiar faces.
Semantic Priming
There was no evidence of semantic priming of face-specific N200, VP350 or the early fraction of N700 (Fig. 4
). However, P290 and the late fraction of N700 exhibited significant priming, suggesting top-down influences of semantic priming at later stages of processing at face-specific N200 sites. There was evidence of semantic priming at some face-specific AP350 sites (e.g. Fig. 5A
). This result suggests that these face-specific AP350s reflect neuronal activity that is functionally analogous to word-specific P400s, which can be primed by prior exposure to a related word or by sentence context (Nobre et al., 1994
). In both cases the neuronal activity is decreased by prior exposure to a semantically related stimulus, and may be involved in creating image-based representations of faces and words.
Face Identification
The results of this experiment (Fig. 6
) suggest that N200 amplitude is unaffected during the learning, gender discrimination and identification stages of a face memory task. By contrast, P290 was significantly larger during the learning stage, while N700 was significantly larger during the identification stage, suggesting later stages of face processing similar to the `face recognition unit' stage in the model of Bruce and Young (Bruce and Young, 1986
) and the `template matching' stage in the model of Damasio et al. (Damasio et al., 1982
). If this inference is correct, later stages of face processing, sensitive to task demands, occur at the same cortical sites as the initial taskinsensitive processing reflected by face-specific N200. There was no evidence in this experiment that the hippocampus is preferentially involved in the learning and identification of faces and names.
Cortical Stimulation
Stimulation of face-specific N200 sites produced two types of face-related changes. The more common was a transient inability to name familiar faces. Such changes were seen at most facespecific N200 sites. At these sites patients did not report distortions of faces, rather they were unable to name familiar faces. These results suggest that stimulation of face-specific N200 sites does not disrupt face perception but instead disconnects the face representation from later face recognition and mnemonic processes. In the patient described by Allison et al. (Allison et al., 1994a
) the facename association was not completely abolished during stimulation; he identified the face at the ordinate level (`politician') but not at the unique level (he identified the face as `President Bush' rather than the state governor, whom he knew well). This type of disruption by cortical stimulation is perhaps a transient form of `associative' prosopagnosia. By contrast, stimulation of non-face-specific sites B1516 (Fig. 8
, Table 2
) produced a deficit in face recognition associated with a general perceptual deficit (distortion of any viewed image). This type of disruption by cortical stimulation is perhaps a transient form of `apperceptive' prosopagnosia.
Less commonly, stimulation evoked transient hallucinations of faces, face parts or bodies. These changes were seen mainly at face-specific N200 or immediately adjacent sites, whereas stimulation of other sites mainly evoked phosphenes in the contralateral visual field, object-naming deficits or no changes. Stimulation of a middle fusiform AP350 site evoked the image of a small face with a large eye. Penfield and Perot (Penfield and Perot, 1963
) stimulated a similarly located site on the fusiform gyrus and evoked the image of `a face in a picture'. Kanwisher and O'Craven (Kanwisher and O'Craven, 1998
) found that imagining faces activated the fusiform face area. These results, together with the finding that stimulation of face-specific N200 sites can evoke face-related hallucinations, suggest that facerelated portions of the fusiform gyrus are involved in face imagery as well as face perception.
| General Discussion |
|---|
|
|
|---|
Relative Responsiveness of Ventral Face-specific N200s
In the previous papers we described the responsiveness of ventral face-specific N200s to a number of stimulus categories. Here it will be useful to recapitulate these results to highlight trends in response amplitude and latency. In Figure 9A
, N200 amplitudes are shown relative to the standard amplitude to grayscale, unfamiliar faces with eyes directed to the viewer. The other stimulus categories are plotted in decreasing order of magnitude. Relative latencies are plotted in the same order. Results for the right and left hemisphere were combined. Several trends are apparent. (i) All face and face-part stimuli evoke N200s that are 40% or more of the response to standard faces. (ii) There is a break in amplitude between all types of face stimuli and all types of non-face, non-body stimuli, with the latter having an amplitude of 16% or less of the response to standard faces. N200 amplitude to hands (26%) is transitional, suggesting either that hands activate face-specific cells slightly more than do other non-face stimuli, or that a subgroup of cells at face-specific N200 sites is hand sensitive. (iii) The only categories of stimuli that evoke N200s larger than those to standard faces are blurred faces, large faces and faces with eyes averted. (iv) Noses evoked the smallest N200 of any face stimulus and evoked the latest N200 of any stimulus. These results may be related to the fact that the nose is the least examined face part during free viewing of faces (Yarbus, 1967
; Shepherd et al., 1981
). (v) There is a dissociation between N200 amplitude and latency. The longest latencies are to lips, noses, and line-drawing faces, whereas most non-face stimuli evoke N200s whose latency is similar to that of standard faces. The simplest explanation of these results is that faces or face parts that are difficult to recognize engage the face processing reflected by N200 but require additional processing time.
|
Specificity of ERP Responsiveness to Faces
When Gross and colleagues first recorded from monkey STS/IT cells that responded best to hands or faces (Gross et al., 1969
, 1972
) there was scepticism whether the increased firing rate was hand or face specific, or whether it was due to more elementary stimulus features, e.g. stimuli with a similar frequency spectrum (Desimone, 1991
; Gross, 1994
). Many studies have since demonstrated that some STS/IT cells indeed respond much better to faces than to many categories of non-face stimuli [reviewed by Desimone (Desimone, 1991
), Gross (Gross, 1992
), Perrett et al. (Perrett et al., 1992
), Logothetis and Scheinberg (Logothetis and Scheinberg, 1996) and Tanaka (Tanaka, 1996
)].
Similarly, the conclusion that N200 and other ERPs are facespecific requires evidence that simpler explanations are not plausible. We made the following observations (many of them summarized in Fig. 9
) at ventral face-specific N200 sites. (i) Scrambled faces that control for luminance (but not spatial frequency) evoked N200s that were 6% as large as the N200 evoked by faces. (ii) Phase-scrambled faces that control for luminance and spatial frequency evoked N200s that were 7% as large as the N200 evoked by faces. (iii) N200 amplitude was approximately size-invariant; it changed by a factor of two over a 32-fold change in face size. (iv) N200 amplitude was not significantly affected by removing the high-frequency or low-frequency portion of the face frequency spectrum. (v) Complex nonobjects such as hyperbolic gratings evoked N200s that were 918% as large as the N200 evoked by faces. (vi) Complex non-living objects such as cars evoked N200s that were 12% as large as the N200 evoked by faces. (vii) Living objects evoked N200s that were 2% (flowers), 5% (butterflies) and 26% (hands) as large as the N200 evoked by faces. (viii) Language-related stimuli evoked N200s that were 3% (nouns) and 14% (Arabic numbers) as large as the N200 evoked by faces. (ix) N200 amplitude was not a function of the emotional valence of the stimuli (Fig. 1
). (x) Other face-specific ERPs were also much larger to faces than to non-face stimuli, but this activity was less frequently encountered and the results are correspondingly less conclusive. These results demonstrate that N200 and other face-specific ERPs reflect the activation of cells driven by the configuration of a human face and not to the incidental features or emotional valence of a face.
Imaging studies also provide evidence of specialized face processing. Kanwisher et al. (Kanwisher et al., 1997
) reported that a region of the fusiform gyrus, primarily in the right hemisphere, was activated more by faces than by non-face stimuli in tasks that eliminated or minimized the contribution of visual attention, subordinate-level classification or processing of non-face body parts. McCarthy et al. (McCarthy et al., 1997
) reasoned that faces may engender both face-specific and general object processing, and that face-specific processing might be revealed only if the general object system was occupied by concurrent object processing. They found that a large portion of the fusiform gyrus bilaterally was activated by faces viewed among non-objects, but that when viewed among objects faces primarily activated a focal right fusiform region. Thus both ERP and fMRI studies demonstrate face-specific processing regions located primarily in portions of the fusiform gyrus.
The Functional Architecture of the Ventral Object Recognition System
Since the seminal work of Mountcastle (Mountcastle, 1957
) in somatosensory cortex it has been known that columns of cells extending vertically through the cortical layers, extending ~0.5 mm horizontally, and having similar response properties, exist in primary sensory cortex [reviewed by Mountcastle (Mountcastle, 1997
)]. Less is known about columnar organization in association cortex, but in monkey STS/IT cortex there is a columnar organization 0.41 mm in diameter as inferred from single-unit recordings (Perrett et al., 1984
; Fujita et al. 1992
), and ~0.5 mm in diameter as determined by optical imaging (Wang et al., 1996
, 1998
). In the human ventral face area, face-specific columns could be intermixed randomly with letter-string-specific, object-specific or other columns, as illustrated schematically in Figure 10A
. Superimposed is an outline of the electrode (2.2 mm diameter) used in our recordings, which would record from about 11 columns assuming a diameter of 0.5 mm and the packing density illustrated. Such an arrangement accounts well for the non-specific responsiveness seen at some sites. However, such a model cannot account for category-specific ERPs, which imply segregation of category-specific columns as illustrated in Figure 10B
. An electrode (left) centered over a patch of face-specific columns would record a face-specific N200, an electrode (right) centered over a patch of letter-string-specific columns would record a letter-string-specific N200, and an electrode (lower left) located over a patch of object-specific columns would record an object-specific N200. An electrode that straddled such patches would record N200s evoked by two or more stimulus categories, as we often observed. This model implies that human extrastriate cortex is more differentiated into category-specific regions than is monkey STS/IT cortex, in which columns of cells responsive to faces may be intermixed with columns responsive to non-face stimuli (Tanaka, 1996
), although patches of cortex with a higher concentration of face-specific cells are also found (Harries and Perrett, 1991
). The data of Allison et al. (Allison et al., 1999
) indicate that human face-responsive patches of cortex are on average 1216 mm wide and 1535 mm long, depending on whether the region is unitary or broken into two patches. Harries and Perrett (Harries and Perrett. 1991
) found that monkey face-sensitive patches were 34 mm wide and 36 mm long, separated by regions less responsive to faces. The human and monkey values may be comparable if differences in brain size are taken into account.
|
The centroids of activation of grating and category-specific sites are summarized in Figure 10C,D
Models of Object Recognition
The model of visual processing implied by Figure 10C
differs from that of Farah (1990, 1994), who concluded that only two systems are needed for object recognition. One system uses holistic processing and is required for faces and used to a lesser extent for objects; the other uses feature-based processing and is required for words and used to a lesser extent for objects. Our results suggest (at least) four systems: one dedicated to face processing and detected electrophysiologically by face-specific N200s; one dedicated to word processing and detected by letter-string-specific N200s; a general object system detected by object-specific N200s; and a hand-specific system detected by hand-specific N230s. Some lesion studies (Newcombe et al., 1994
; Rumiati et al., 1994
; Moscovitch et al., 1997
; De Renzi and di Pellegrino, 1998; Buxbaum et al., 1999
) also suggest a general object recognition system distinct from the systems subserving face and word recognition.
Some sites generated N200s specific to one or more internal face parts (McCarthy et al., 1999
, Fig. 8
), which could be regarded as evidence for face-part-specific processing independent of face processing. For the time being it is parsimonious to view this activity as a variant of face-specific processing. We previously inferred that there may be a separate system dedicated to the perception of Arabic numbers (Allison et al., 1994b
), but we encountered only three number-specific N200 sites, not enough to provide convincing evidence of such a system. A fMRI and behavioral study also suggests that letter and digit recognition depend on different neural substrates that become differentiated by experience (Polk and Farah, 1998
).
It is likely that all humans have face-specific, and that all literate humans have letter-string-specific, patches of ventral extrastriate cortex. Furthermore, it is possible that the general object recognition system is itself not monolithic, and may become differentiated by experience. Carey and colleagues [reviewed by Carey and Diamond (Carey and Diamond, 1994
)] concluded that years of experience are required to develop the perceptual expertise needed for face and dog encoding, a conclusion that might hold for other types of category-specific recognition as well. Newcombe et al. (Newcombe et al., 1994
) described category-specific deficits in visual recognition, and Caramazza and Shelton (Caramazza and Shelton, 1998
) argued for the presence of category-specific knowledge systems that develop evolutionarily or developmentally. Depending on experience and expertise, it is possible that an individual's mosaic of category-specific patches of cortex may be even more differentiated than proposed in Figure 10C,D
. We have not had the opportunity to test a patient with category-specific expertise (e.g. a dog judge or experienced birdwatcher), but recordings in typical patients using a variety of object categories might be useful in testing this possibility. It is unlikely that the development of face-specific ERPs can be assessed by intracranial recordings. The youngest patient in this study was 10 years old, the age at which children perform in the normal adult range on face-encoding tasks (Carey and Diamond, 1994
); indeed in this patient we recorded a face-specific N200 that was well within normal adult limits of amplitude and latency. However, scalprecorded ERPs may prove useful in tracking the development of face-sensitive ERPs (Taylor et al., 1997
).
Face-specific N200 Reflects the Operation of a Face Module
We have used the term `face module' as shorthand to describe face-specific N200 sites (Allison et al., 1994a
). However, `module' carries additional connotations; in particular it implies a neuronal population that responds to a preferred input in an automatic, mandatory fashion and carries out specific computations that are encapsulated and relatively immune to outside influence (Marr, 1976
; Fodor, 1983
).
The evidence reviewed in this and previous papers supports the conclusion that face-specific N200s are generated by a population of neurons that initially respond in a mandatory and largely invariant manner over a wide range of manipulations (Table 1
). Thus N200 (i) is evoked in passive viewing tasks that do not require explicit learning or identification of faces; (ii) is unaffected by face familiarity; (iii) shows little or no habituation; (iv) is not semantically primed; (v) responds much more to faces than to other stimulus categories; (vi) and is recorded from specific regions of cortex. This activity is therefore modular as defined by Fodor (Fodor, 1983
), except that we do not know to what extent N200 reflects the operation of an innate process, as Fodor suggests it should. Additional study of N200 may alter this conclusion, but the available evidence allows the conclusion that it reflects the operation of a module specialized for face perception. Moscovitch et al. (Moscovitch et al., 1997
) studied a patient with a severe object agnosia but with normal face recognition, and also concluded that face recognition is modular. What is the function of the `face-specific N200 module'? N200 is sensitive to the configuration of a face but is insensitive to its familiarity. We propose that the operations reflected by N200 are the instantiation of a `structural encoding module' in the model of Bruce and Young (Bruce and Young, 1986
) or a `template formation module' in the model of Damasio et al. (Damasio et al., 1982
).
The results of the line-drawing (McCarthy et al., 1999
, Fig. 1
), priming (Fig. 4
), and face identification (Fig. 6
) experiments suggest that later activity at face-specific N200 sites (P290 and N700) may be subject to influences related to recognition processes (Table 1
). These results lead to the conclusion that modularity can be limited in time as well as space. Modularity is conceptualized as a set of neurons that perform automatic, encapsulated operations on a preferred input. Our results suggest that initial operations performed by such neurons can be modular, whereas later operations of the same neurons can be influenced by top-down processes and are thus non-modular.
Limitations of These Studies
Electrogenesis of ERPs
A major gap in our knowledge of the ERPs described here is that their cellular basis is unknown. However, a plausible model is available from single-cell and ERP recordings in animals. In primary sensory cortex natural stimulation of receptors, or electrical stimulation of afferent pathways, evokes initial surface positivenegative potentials referred to in the older literature as the `primary evoked response' (Towe, 1966
). The primary positivity is due to initial depolarization of layer 34 pyramidal cells at the level of the cell body, and a corresponding positive source potential in the apical dendrites. The primary negativity is due to later depolarization of the apical dendrites, either by direct synaptic excitation or by back-propagation from the axosomatic region (Schlag, 1973
; Cruetzfeldt and Houchin, 1974; Wood and Allison, 1981
). Monkey area TEO sends feedforward projections to layer 34 of area TE (Distler et al., 1993
; Saleem et al., 1993
), similar to the thalamocortical afferents to primary sensory cortex. The human ventral face area is probably homologous to area TE and thus probably receives feedforward projections to layer 34 from the homolog of monkey area TEO. These considerations suggest that P150 is analogous to the primary positivity, while N200 is analogous to the primary negativity. If this inference is correct, these potentials reflect initial phasic excitation of layer 34 pyramidal cells at facespecific N200 sites. N700 may reflect tonic excitation of the same cells, comparable to the tonic discharge of face-specific STS/IT cells during face perception (Oram and Perrett, 1992
). The best test of this model would come from simultaneous ERP and single-unit recordings from face-sensitive patches in monkey STS/IT cortex.
Face-specific Processing in the Frontal Lobe
In monkey inferior prefrontal cortex some cells are face selective, with response properties similar to those in STS/IT cortex (Wilson et al., 1993
; Ó Scalaidhe et al., 1997
). We recorded small face-specific ERPs from a few frontal lobe sites (Allison et al., 1999
). Our failure to find sites that generated large, obviously face-specific ERPs like those recorded from occipitotemporal cortex may be due to two factors. (i) Monkey face-selective cells are found mainly in a small area just lateral to the principal sulcus (Ó Scalaidhe et al., 1997
). The human homolog of this area is unknown, but most of our frontal electrodes were posterior and superior to the region of inferior prefrontal cortex that would be expected to contain a homologous area. (ii) Only 5% of cells were face-selective within the face-selective area (Ó Scalaidhe et al., 1997
), less than the 1034% of face-selective cells found in selected areas of monkey STS/IT cortex (Perrett et al., 1982
; Desimone et al., 1984
). If the same ratio holds in humans, face-specific ERPs would be correspondingly smaller. Our preliminary results suggest that human prefrontal cortex contains face-specific patches of cortex, and encourage a more systematic search for such sites.
Relationship Between Scalp and Intracranial ERPs
Halgren et al. (Halgren et al., 1994
) recorded an N130P180 N240 sequence of potentials in white matter superior to the fusiform gyrus. At some sites these ERPs were face specific, and probably reflect polarity-inverted counterparts of the P150 N200P290 sequence we recorded from the surface of the fusiform gyrus. In scalp recordings faces evoke a positivity that is largest at the vertex and has a latency of 150200 ms (Grüsser et al., 1990
; Bentin et al., 1996
; George et al., 1996
; Jeffreys, 1996
; Schendan et al., 1998
). The vertex positivity may be a polarity-inverted counterpart of N200, although it may also partly reflect the polarity-inverted counterpart of N170 (George et al., 1996
). Simultaneous scalp and intracranial recordings will be necessary to determine the relationship among these facesensitive ERPs.
Attention
Imaging studies indicate that attention to faces increases face-related activation of the fusiform gyrus (Haxby et al., 1994
; Clark et al., 1997
; Wojciulik et al., 1998
) [reviewed by McCarthy (McCarthy, 1999
)]. Due to time constraints and the limited capacity of many patients to perform cognitive tasks, we have not carried out tests of the effects of attention on face-specific ERPs. In suitable patients such experiments will be required to address the important question of the effects of attention on face-specific and related ERPs.
Perception of Static and Dynamic Faces
Studies in monkeys and humans demonstrate that portions of the superior temporal sulcus and adjacent cortex are involved in analysis of direction of gaze, and eye, mouth, hand and body movement (Campbell et al., 1990
; Perrett et al., 1992
; Bonda et al., 1996
; Oram and Perrett, 1996
; Calvert et al., 1997
; Puce et al., 1998
). The stimuli used in the present study were static. We have begun to study the ERP responsiveness of occipitotemporal cortex to moving eyes and mouths; these results are being reported separately (Puce and Allison, 1999
; Puce et al.,1999).
| Notes |
|---|
We thank F. Favorini, J. Jasiorkowski, M. Jensen, M. Luby and K. McCarthy for assistance, and Dr A.C. Nobre for collaboration in the face identification experiment. This work was supported by the Veterans Administration and by NIMH grant MH-05286.
Address correspondence to Gregory McCarthy, Brain Imaging and Analysis Center, Box 3808, Duke University Medical Center, Durham, NC 27710, USA. Email: gregory.mccarthy{at}duke.edu.









