Cerebral Cortex Advance Access originally published online on July 7, 2007
Cerebral Cortex 2008 18(3):598-609; doi:10.1093/cercor/bhm091
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Effect of Prior Visual Information on Recognition of Speech and Sounds
1 Max-Planck-Institute for Biological Cybernetics, Tuebingen, Germany, 2 Wellcome Department of Imaging Neuroscience, Institute of Neurology, London, UK
Address correspondence to Uta Noppeney, Max-Planck-Institute for Biological Cybernetics, Spemannstrasse 38, 72076 Tubingen, Germany. Email: uta.noppeney{at}tuebingen.mpg.de.
| Abstract |
|---|
|
|
|---|
To identify and categorize complex stimuli such as familiar objects or speech, the human brain integrates information that is abstracted at multiple levels from its sensory inputs. Using cross-modal priming for spoken words and sounds, this functional magnetic resonance imaging study identified 3 distinct classes of visuoauditory incongruency effects: visuoauditory incongruency effects were selective for 1) spoken words in the left superior temporal sulcus (STS), 2) environmental sounds in the left angular gyrus (AG), and 3) both words and sounds in the lateral and medial prefrontal cortices (IFS/mPFC). From a cognitive perspective, these incongruency effects suggest that prior visual information influences the neural processes underlying speech and sound recognition at multiple levels, with the STS being involved in phonological, AG in semantic, and mPFC/IFS in higher conceptual processing. In terms of neural mechanisms, effective connectivity analyses (dynamic causal modeling) suggest that these incongruency effects may emerge via greater bottom-up effects from early auditory regions to intermediate multisensory integration areas (i.e., STS and AG). This is consistent with a predictive coding perspective on hierarchical Bayesian inference in the cortex where the domain of the prediction error (phonological vs. semantic) determines its regional expression (middle temporal gyrus/STS vs. AG/intraparietal sulcus).
Key Words: cross-modal priming dynamic causal modeling effective connectivity multisensory integration predictive coding semantics
| Introduction |
|---|
|
|
|---|
To form a coherent and unified percept, the human brain combines information from multiple senses (Stein and Meredith 1993
Multiple different experimental paradigms and analyses have been used to characterize audiovisual interactions. Classically, multisensory integration areas have been identified by superimposition of auditory and visual activations (e.g., using implicit masking or conjunction analyses, Friston et al. 2005
), audiovisual interaction effects, and congruency manipulations (Calvert 2001
; Calvert et al. 2001
). Complementary insights into the variety of audiovisual interactions have been obtained from visuoauditory matching (Taylor et al. 2006
), recognition (Nyberg et al. 2000
; Gottfried et al. 2004
; Lehmann and Murray 2005
; Murray et al. 2005
), association learning (Gibson and Maunsell 1997
; Fuster et al. 2000
; Gonzalo and Büchel 2003; Tanabe et al. 2005
), and priming (Badgaiyan et al. 1999
) paradigms: Despite a degree of convergence in the results, these diverse paradigms are likely to highlight distinct aspects of multisensory processes: Thus, visuoauditory matching tasks require explicit access to unimodal percepts, multisensory interactions involve the integration of sensory features into a unified percept, and recognition paradigms invoke additional memory components.
In the current study, we employed immediate visuoauditory priming (for review, see Henson 2003
; Henson and Rugg 2003
; Grill-Spector et al. 2006
) to investigate the effect of prior visual information on categorization of complex stimuli such as environmental sounds and spoken words in terms of behavioral interference/facilitation and associated activation changes.
Categorization of spoken words and environmental sounds (=source sounds, e.g., cat's meowing) both engage phonological, semantic, and higher conceptual processes. However, they do so to different degrees: recognition and categorization of spoken words or speech (i.e., verbal stimuli) relies primarily on the interaction between perceptual and phonological processes, that is, processing of speech sounds (Potter and Faulconer 1975
; Plaut et al. 1996
; Binder et al. 2000
, 2004
). By contrast, recognition and categorization of environmental sounds (i.e., nonverbal stimuli) is accomplished primarily through interaction of perceptual and semantic processes (Humphreys and Forde 2001
; Lewis et al. 2004
; Rogers et al. 2004
; Ikeda et al. 2006
). However, this is a continuous rather than categorical distinction. For instance, auditory word recognition may also activate semantic representations related to the meaning of the word. Conversely, sound object recognition may involve implicit name retrieval. Furthermore, categorization of sounds or words will involve higher level conceptual or decisional processes that do not depend on the particular stimulus format but are elicited irrespective of stimulus material (verbal, nonverbal) or modality (auditory, visual, etc.).
Incongruent prior visual information will interfere with and thus place more demands on the processes involved in sound and speech recognition. Hence, categorization of auditory stimuli that are preceded by incongruent visual stimuli may be associated primarily with phonological incongruency for spoken words (e.g., the spoken word cat) and semantic incongruency for sounds (e.g., the meowing sound of a cat). Both, incongruent spoken words and sounds may elicit higher level conceptual (or decisional) incongruency effects.
Combining visuoauditory priming for environmental sounds and spoken words may thus enable us to dissociate visuoauditory incongruency effects that may emerge at the phonological, semantic, and higher conceptual level. At the neuronal level, these incongruency effects are thought to be associated with activation increases for incongruent trials—possibly reflecting a prediction error signal (Rao and Ballard 1999
; Friston and Price 2001
)—in regions sustaining phonological, semantic, or higher conceptual processes.
These differential contributions of phonology, semantics, and conceptual (or decision) elements to categorization of sounds and spoken words provide the rationale for our visuoauditory priming paradigm: subjects were presented with a brief (100 ms) visual prime (i.e., a picture or a written word) that was followed by a congruent or incongruent auditory target (i.e., a sound or a spoken word) after an additional 100 ms. Both, visual primes and auditory targets could either be verbal (i.e., written words and spoken words) or nonverbal (i.e., sounds and pictures). Subjects passively attended the visual prime and categorized the auditory targets, that is, the spoken words and sounds according to their weight (heavier than 4 kg?).
Using this fully balanced multifactorial design (see Fig. 1), we first identified regions that were influenced by visuoauditory (in)congruency. Within these regions, we investigated whether the (in)congruency effects depended on the target material and were different for spoken words and sounds. This allowed us to segregate incongruency effects into 3 classes: visuoauditory incongruency effects that were 1) selective for spoken words, 2) selective for sounds, or 3) common to spoken words and sounds. Following our initial rationale, we related these 3 types of visuoauditory incongruency effects to multisensory interactions at the 1) phonological, 2) semantic, and 3) conceptual/decisional level.
|
Using dynamic causal modeling (DCM; Friston et al. 2003
| Mateirals and Methods |
|---|
|
|
|---|
Subjects
Seventeen healthy right-handed English native speakers (5 females, median age 25) gave informed consent to participate in the study. The study was approved by the joint ethics committee of the Institute of Neurology and University College London Hospital, London, UK.
Experimental Design
The paradigm was a 2-choice forced semantic categorization of auditory stimuli that were preceded by visual stimuli. The activation conditions conformed to a 3 x 2 x 2 factorial design manipulating
- Congruency (3 levels): 1) congruent identity and response (=congruent), 2) incongruent identity and congruent response (=incongruencyI), 3) incongruent identity and incongruent response (=incongruencyI+R),
- Prime material (2 levels): written words, pictures (i.e., verbal vs. nonverbal), and
- Target material (2 levels): spoken words, sounds (i.e., verbal vs. nonverbal).
At the beginning of each trial, a visual prime (i.e., written words or color pictures) was presented for 100 ms followed by the auditory target (i.e., spoken words or sounds) after additional 100 ms. A very short prime–target asynchrony (200 ms) was selected because we were interested in automatic priming and aimed to reduce any strategic components (see Neely 1977
). This rapid subsequent presentation was perceived as "nearly synchronous" by subjects. The trial onset asynchrony was 3.25 s. Subjects passively attended to the visual primes and performed a semantic decision on the auditory targets (Is the target stimulus heavier than 4 kg?). Fifty percent of the stimuli weighed more than 4 kg and 50% weighed less. Altogether, there were 64 stimuli: 32 animals and 32 tools (length, mean + standard deviation, of sounds: 0.8 ± 0.2 s; spoken words: 0.76 ± 0.2 s). These 2 distinct categories were selected to enable incongruent pairings between categories and thus induce strong and reliable incongruency effects. Therefore, category-selective activations that have been characterized by numerous previous studies (Chao et al. 1999
; Lewis et al. 2004
, 2005
; Noppeney et al. 2006
) are difficult to evaluate (i.e., half of the compound trials are mixtures of both categories) and not the focus of this communication.
Fifty percent of the trials were identity congruent, that is, prime and target referred to the same object (e.g., a picture of a dog followed by the barking sound of a dog). The remaining 50% of trials were identity incongruent (i.e., visual prime and auditory target referred to different stimuli). In half of the identity incongruent trials (i.e., 25% of the total trials), both, prime and target, weighed less than 4 kg or both weighed more than 4 kg (e.g., a picture of an elephant followed by the sound of a car). In the other half of the identity incongruent trials (i.e., 25% of the total trials), only one of the objects weighed more (or less) than 4 kg (e.g., a picture of a fly followed by the sound of a car). In summary, 50% of the trials were identity congruent, 25% identity incongruent and response congruent, and 25% identity incongruent and response incongruent. This allowed us to dissociate the effect of identity incongruency from response incongruency.
Each stimulus (e.g., bear, see Appendix) was presented 16 times, 8 times as prime (i.e., 4 times as picture and 4 times as written word) and 8 times as target (i.e., 4 times as sound and 4 times as spoken word), amounting to 512 cross-modal trials (i.e., 64 x 8 = 512 trials). In the congruent trials, each stimulus was presented once in each of the following pairings: 1) written word–spoken word, 2) written word–sound, 3) picture–spoken word, and 4) picture–sound. Similarly, in the incongruent trials, each stimulus was equally often presented in each modality pairing. However, here a target stimulus (e.g., bear) was presented with 4 different primes (see Appendix). Presenting the stimuli only once in each pairing and thus changing the surface features ensured that subjects did not engage in prime–target association learning. Furthermore, it ensured that the stimuli were rotated and fully counterbalanced across conditions within and between subjects.
Additionally, 48 intramodal visual trials (i.e., picture–picture, picture–written word, written word–picture, written word–written word) were included to maintain subjects' attention to the visual primes that were response irrelevant. Fifty percent of the trials required a yes response. Yes/no responses to all conditions were indicated (as quickly and as accurately as possible) by a 2-choice key press. The activation conditions were interleaved with 6 s fixation. The stimuli and order of conditions were randomized.
Functional Magnetic Resonance Imaging
A 3-T Siemens Allegra system was used to acquire both T1 anatomical volume images and T2*-weighted axial echo-planar images with blood oxygenation level–dependent contrast (GE-EPI, Cartesian k-space sampling, time echo = 30 ms, time repetition = 2.47 s, 38 axial slices, acquired sequentially in descending direction, matrix 64 x 64, spatial resolution 3 x 3 x 3.4 mm3 voxels, interslice gap 1.4 mm, slice thickness 2.0 mm). To minimize Nyquist ghost artifacts, a generalized reconstruction algorithm was used for data processing (Josephs et al. 2000). There were 2 sessions with a total of 473 volume images per session. The first 6 volumes were discarded to allow for T1 equilibration effects (Table 1).
|
Conventional SPM Analysis
The data were analyzed with statistical parametric mapping (using SPM2 software from the Wellcome Department of Imaging Neuroscience, London; www.fil.ion.ucl.ac.uk/spm (Friston et al. 1995
). Scans from each subject were realigned using the first as a reference, spatially normalized into Montreal Neurological Institute standard space (Talairach and Tournoux 1988
; Evans et al. 1992), resampled to 3 x 3 x 3 mm3 voxels, and spatially smoothed with a Gaussian kernel of 8 mm full-width half-maxium. The time series in each voxel was high-pass filtered to 1/128 Hz and globally normalized with proportional scaling. The fMRI experiment was modeled in an event-related fashion using regressors obtained by convolving each event-related unit impulse with a canonical hemodynamic response function and its first temporal derivative. In addition to modeling the 12 conditions in our 2 x 2 x 3 factorial design (only correct trials included), the statistical model included intramodal trials, errors, and non-responses. Nuisance covariates included the realignment parameters (to account for residual motion artifacts). Condition-specific effects for each subject were estimated according to the general linear model and passed to a second-level analysis as contrasts. This involved creating contrast images averaged over all cross-modal conditions > fixation (averaged over the 2 sessions) for each subject and entering them into a second-level one-sample t-test. In addition, the response for each of the 12 conditions (summed over the 2 sessions) was estimated and entered into a second-level analysis of variance (ANOVA). This ANOVA modeled the 12 effects in our 2 x 2 x 3 factorial design.
Inferences were made at the second level to allow a random effects analysis and inferences at the population level (Friston et al. 1999
).
The random effects ANOVA analysis tested for the effects of incongruency. Pooling over picture and written word primes, we tested for incongruency effects that were selective for 1) spoken words, 2) sounds (i.e., the interaction between congruent vs. incongruent and sounds vs. spoken words), or 3) common to sounds and spoken words.
Search Volume Constraints
The search space (i.e., volume of interest) was constrained using orthogonal contrasts: the search space for the main and simple main effects of (in)congruency was limited to voxels that were activated for cross-modal stimuli > fixation at a threshold of P < 0.01 uncorrected (extent threshold > 15 voxels). The search space for the interaction effects was limited to voxels that were activated for cross-modal stimuli > fixation at P < 0.01 uncorrected (extent threshold > 15 voxels) and exhibited a main effect of congruency (i.e., incongruent > congruent stimuli at P < 0.001, uncorrected; extent threshold > 15 voxels). To identify conceptual (or decisional) congruency effects that were common for sounds and spoken word targets, each effect was tested within a search volume mutually constrained by the other contrast (see Friston et al. 2005
). This approach is equivalent to a (conjunction-null) conjunction analysis (i.e., a logical AND). Unless otherwise stated, we only report activations that are significant (P < 0.05) corrected for the search volume.
Effective Connectivity Analysis: DCM
DCM treats the brain as a dynamic input-state-output system. The inputs correspond to conventional stimulus functions encoding experimental manipulations. The state variables are neuronal activities, and the outputs are the regional hemodynamic responses measured with fMRI. The idea is to model changes in the states, which cannot be observed directly, using the known inputs and outputs. Critically, changes in the states of one region depend on the states (i.e., activity) of others. This dependency is parameterized by effective connectivity. There are 3 types of parameters in a DCM: 1) input parameters which describe how much brain regions respond to experimental stimuli, 2) intrinsic parameters that characterize effective connectivity among regions, and 3) modulatory parameters that characterize changes in effective connectivity caused by experimental manipulation. This third set of parameters, the modulatory effects, allows us to explain fMRI incongruency effects by changes in coupling among brain areas. Importantly, this coupling (effective connectivity) is expressed at the level of neuronal states. DCM employs a forward model, relating neuronal activity to fMRI data that can be inverted during the model fitting process. Put simply, the forward model is used to predict outputs using the inputs. The parameters are adjusted (using gradient descent) so that the predicted and observed outputs match. This adjustment corresponds to the model fitting.
For each subject, 2 DCMs (Friston et al. 2003
) were constructed that entailed our 2 alternative hypotheses. In the first "bottom-up model," the incongruency effects emerge in a material-dependent fashion (i.e., selective for sounds or spoken words) via changes in forward connections from early auditory to intermediate multisensory areas. In the second "top-down model," they emerge irrespective of stimulus material through interactions among higher cognitive control regions and propagate down the cortical hierarchy to lower areas. Here, higher cognitive control regions such as the AC/mPFC and the lateral prefrontal cortex (IFS) may act as a general conflict monitoring and cognitive control device (Duncan and Owen 2000
; Botvinick et al. 2001
; Paus 2001
; Kerns et al. 2004
; Brown and Braver 2005
) that modulates activation in intermediate multisensory convergence areas.
Each DCM (Fig. 5) included 6 regions that formed a 3-level cortical hierarchy: 1) a left superior temporal area that was activated by cross-modal stimuli relative to fixation (superior temporal gyrus [STG]; x = –63, y = –24, z = 9), 2) a left fusiform region that was activated by cross-modal stimuli relative to fixation (fusiform gyrus [FG]; x = –45, y = –60, z = –21), 3) a region in the left superior temporal sulcus (STS) exhibiting an incongruency effect that was selective for spoken words (STS/middle temporal gyrus [MTG]; x = –66, y = –27, z = –3), 4) a region in the left angular gyrus (AG)/IPS exhibiting an incongruency effect selective for sounds (AG/IPS; x = –30, y = –75, z = 42), 5) the AC/mPFC (x = 0, y = 18, z = 48), and 6) left inferior frontal sulcus (IFS; x = –42, y = 12, z = 24) showing non-selective incongruency effects. The effects of stimuli entered as extrinsic input to STG and FG separately for picture–sound, picture–word, word–sound, word–word stimuli to account for material-selective activation differences. Holding the number of parameters, the intrinsic and extrinsic connectivity structure constant, the 2 DCMs differed in where congruency effects were exerted: In the bottom-up DCM, the incongruency factor increased the forward connections from STG and FG to AG/IPS and STS/MTG in a material-dependent fashion. In the top-down DCM, they increased the connections between AC and IFS in a material-independent manner. Thus, these models encode either a greater sensitivity of AG/IPS and STS/MTG to incongruent bottom-up inputs or incongruent top-down inputs. Comparing these models allowed us to distinguish between a bottom-up and top-down mediation of incongruency effects.
The regions were selected using the maxima of the relevant contrasts from our random effects analysis. Region-specific time series (concatenated over the 2 sessions and adjusted for confounds) comprised the first eigenvariate of all voxels within a 4-mm radius centered on each peak identified in the random effects analysis.
For each model, the subject-specific modulatory effects were entered into t-tests at the group level (see Fig. 4). This allowed us to summarize the consistent findings from the subject-specific DCMs using classical statistics.
Bayes factors (=the ratio of the model evidences, Kass and Raftery 1995
) were used for model comparison, that is, to decide whether the bottom-up or top-down DCM was the better model (Penny et al. 2004
). In brief, given the measured data y and 2 competing models, Bayes factors are the ratio of the evidences of the 2 models. A Bayes factor of one presents equal evidence for the 2 models. A Bayes factor above 3 is considered positive evidence for one of the 2 models. The model evidence does depend not only on model fit but also on model complexity. Here, we have limited ourselves to the bottom-up and top-down models that were equated for the number of parameters, that is model complexity, and did not design a third more complex model endowed with bottom-up and top-down effects.
Finally, a group analysis was implemented by taking the product of the subject-specific Bayes factors over subjects (this is equivalent to the exponentiated sum of the log model evidences of each subject-specific DCM). However, we also report the Bayes factors for each individual subject (see Fig. 5, right column) to provide an intuition of consistency over subjects. As the Bayes factors for some subjects were very large, we have selected a cutoff of 8 to focus on the consistency across subjects in Figure 5.
| Results |
|---|
|
|
|---|
In the following, we report 1) the behavioral results, 2) the fMRI results of the conventional analysis focussing on regionally selective activations, and 3) the DCM results providing insight into potential neural mechanisms that mediate the observed regional activations.
Behavioral Results
For performance accuracy, a 3-way ANOVA with congruency (congruent vs. incongruentI vs. incongruentI+R), prime material (picture vs. written word), and target material (sound vs. spoken word) identified a significant main effect of congruency (F1.4,21.7 = 10.5, P < 0.01) and of target material (F1,16 = 32, P < 0.001) after Greenhouse–Geisser correction. In addition, there was a significant interaction effect between congruency and target material (F2,31 = 9.2, P = 0.001). For reaction times (RTs) (limited to correct trials only), a 3-way ANOVA identified main effects of congruency (F1.8, 28.6 = 129.1, P < 0.001) and target material (F1,16 = 11.8, P < 0.01) following Greenhouse–Geisser correction. RTs were shorter for spoken words than sounds. The absence of any significant interactions of congruency with prime (written words vs. pictures) or target material (spoken words vs. sounds) suggests that the prime duration (100 ms) allowed pictures and written words to elicit comparable priming effects irrespective of the target material (sounds or spoken words).
Post hoc comparisons (Bonferroni corrected) for accuracy and RTs revealed a significant incongruency effect of identity but not of response. Overall, these behavioral results suggest that incongruency may affect processes of stimulus recognition and categorization.
Conventional SPM Analysis
The conventional SPM analysis was performed in 2 steps: First, we identified regions that showed increased activation for incongruent > congruent stimuli (within the system of regions activated relative to fixation, see Materials and Methods). Second, within this system, pooling over prime, we tested for incongruency effects that were 1) common to sounds and spoken words, 2) selective for spoken words, or 3) selective for sounds (i.e., the interaction between congruent vs. incongruent and sounds vs. spoken words). For completeness, pooling over target, we tested for incongruency effects that were selective for pictures or written words (i.e., the interaction between congruent vs. incongruent and pictures vs. written words). In other words, we used the factorial character of our experimental design and pooled over one factor to increase the power when investigating the effect of the other factor.
Main Effect of Identity and Response Incongruency
Incongruent stimuli increased activations relative to congruent stimuli, in the AC/mPFC, bilateral IFS, left insula, IPS/the AG and MTG/STS, and the right cerebellum. None of the regions showed an effect of response incongruency (P > 0.05 uncorrected at peak coordinates). In other words, the activation in those areas did not depend on whether prime and target object required the same response but was primarily driven by whether visual prime and auditory target referred to the same object. This suggests that the activation increases might at least in part be due to incongruencies at the level of object processing and categorization rather than only response selection and preparation.
No increased activation was observed for congruent relative to incongruent trials within the system of regions activated relative to fixation (see Materials and Methods) (Table 2).
|
Modulatory Effect of Target Material: Incongruency Effects Selective for Spoken Words, Sounds, or Both
Within the incongruency system identified above, the medial prefrontal region and the left IFS exhibited incongruency effects common for sounds and spoken words. Critically, pooling over primes, we observed a significant interaction between incongruency and target material: the left MTG/STS showed an enhanced incongruency effect for spoken words relative to sounds. In contrast, the left AG (extending into IPS) showed an increased incongruency effect for sounds relative to spoken words. Following the rationale of this experiment, the incongruency effects in mPFC/IFS may relate to higher conceptual/decisional processes, in AG/IPS to semantic processes, and in STS/MTG to phonological processes. In addition, we observed an incongruency effect selective for sounds in a more dorsal medial prefrontal region. Although only correct trials were included in our fMRI analysis, we note sound trials were still associated with greater error probability. Hence, the increased mPFC activation for incongruent sound trials may be related to their inherent ambiguity (cf., recent studies associating mPFC/AC with error probability prediction rather than error detection per se, Brown and Braver 2005
) (Tables 3 and 4, Figs 2 and 3).
|
|
|
|
Modulatory Effect of Prime Material (Written Words vs. Pictures)
For completeness, pooling over target material, we tested for incongruency effects that were modulated by prime material (i.e., the interaction between incongruent vs. congruent and pictures vs. written words). However, no regions exhibited a significant interaction effect between congruency and prime material. The absence of a significant modulatory effect of prime material may be related to several factors. 1) The prime was presented very briefly (100 ms). 2) It was task and response irrelevant. 3) At the time of target presentation (i.e., 200 ms post-prime presentation), both phonological and semantic information may be available irrespective of target material (cf., Rahman et al. 2003
; Schiller et al. 2003
; Moscoso del Prado et al. 2006).
Effect of Performance on fMRI Incongruency Effects
To further characterize the common incongruency effects in the mPFC/AC and left IFS, we investigated their relationship to subject's performance measures. For this, we performed a second-level multiregression analysis, where we used subject-specific behavioral interference effects (i.e., RT and accuracy differences for incongruent > congruent) as predictors for the fMRI incongruency effects, expressed physiologically, in the mPFC/AC and left IFS (i.e., increased activation for incongruent > congruent stimuli). As RT and accuracy differences for incongruent relative to congruent trials were strongly negatively correlated over subjects (correlation coefficient = –0.7), we orthogonalized the accuracy with respect to the RT regressors. Given our a priori interests in the role of AC/mPFC and left IFS in incongruency effects, the results of this analysis are reported corrected for multiple comparisons within spheres (10 mm radius) centered on the peaks identified in the previous conjunction analysis (this does not bias our inference because the effects of RT and accuracy are orthogonal to the incongruency effects).
RT interference positively predicted fMRI incongruency effects in the mPFC/AC (x = 0, y = 24, z = 48; z-score = 3.51; P(svc) = 0.04) and in the lateral prefrontal cortex (x = –45, y = 6, z = 24; z-score = 3.5; P(svc) = 0.04). In addition, incongruency effects on accuracy negatively predicted fMRI incongruency effects in the lateral prefrontal cortex (x = –51, y = 9, z = 24; z-score = 3.9; P(svc) = 0.01). In other words, strong fMRI incongruency effects are associated with relatively longer RTs and lower accuracy for incongruent relative to congruent trials. Thus, consistent with current theories that implicate the mPFC/AC, IFS circuitry in conflict monitoring and cognitive control processes, mPFC/AC and IFS activation may be associated with stronger interference as indicated by longer processing times and less accurate performance on incongruent trials (Fig. 4).
|
Summary of the Results from the Conventional SPM Analysis
In summary, our results demonstrate that 1) the left MTG/STS shows an increased incongruency effect for spoken words relative to sounds, 2) the left AG/IPS exhibits an increased incongruency effect for sounds relative to spoken words, and 3) a medial prefrontal region and the left IFS are activated for incongruent relative to congruent stimuli for sounds as well as spoken words. Furthermore, the incongruency effects in mPFC/AC and left IFS were predicted by the behavioral interference effects across subjects.
DCM Analysis
At the group level, strong evidence was provided for the bottom-up relative to the top-down model suggesting that the incongruency effects may emerge in a material-dependent fashion (i.e., selective for spoken words or sounds) via modulation of forward connections from early auditory regions to STS/MTG and AG/IPS. In other words, the STS/MTG and AG/IPS showed a greater response to bottom-up inputs when they were incongruent. Figure 5 (right column) shows the Bayes factors (relative likelihood of the bottom-up model, relative to the top-down model) for each subject, to provide an intuition of consistency over subjects. A cutoff of 8 was used to focus on the fact that—despite intersubject variability in the magnitude of the Bayes factors—the bottom-up model provided a better explanation of the data than the top-down model in all subjects (apart from one showing equal model evidences). As the 2 DCMs were equated for the number of modulatory effects and intrinsic as well as extrinsic connectivity structure, the difference in model evidence is only due to model fit but not model complexity.
|
The numbers by the connections are the change in coupling (i.e., responsiveness of the target region) induced by incongruency or material (sounds vs. spoken words) effects averaged across subjects. Note that in both models, 3 modulatory effects are significant across subjects (Fig. 5).
| Discussion |
|---|
|
|
|---|
This visuoauditory priming study demonstrates the effect of prior visual information on recognition and categorization of environmental sounds and spoken words at the neural and behavioral level. Subjects spent more time and were less accurate for incongruent relative to congruent trials. Consistent with the behavioral interference effect, incongruent relative to congruent visuoauditory trials increased activation in a large distributed neural system encompassing the AC/mPFC, IFS, AG/IPS, and MTG/STS. These effects were observed for incongruent trials irrespective of additional response incongruency. Critically, while the behavioral interference—as measured by longer RTs—was equivalent for sounds and spoken words, our functional imaging results revealed that they were mediated by distinct neural systems. Combining visuoauditory priming for spoken words and environmental sounds enabled us to test for the interaction between congruency and target material (i.e., sounds vs. spoken words) and segregate the incongruency effects into 3 classes: visuoauditory incongruency effects were enhanced for 1) spoken words in the left anterior MTG/STS, 2) sounds in the left AG/IPS, and 3) both words and sounds, in the mPFC/AC and left IFS.
From a cognitive perspective, these distinct classes suggest that prior visual information modulates categorization of complex auditory stimuli at multiple stages. Based on our initial rationale that processing auditory-visual stimuli relies more on phonology for spoken words and semantics for environmental sounds, these regionally selective responses may implicate 1) the MTG/STS in phonological, 2) the AG/IPS in semantic and associated recognition processes, and 3) mPFCm/IFS in higher conceptual or "conflict monitoring" processes. In terms of neural mechanisms, our DCM results suggest that these incongruency effects may emerge in a material-dependent fashion, that is, selective for spoken words and environmental sounds via a greater influence of forwards connections from early auditory regions to MTG/STS and AG/IPS.
The selective response enhancement in STS/MTG for incongruent spoken words is consistent with its established role in auditory speech processing (Mummery et al. 1999
; Binder et al. 2000
; Scott et al. 2000
; Giraud and Price 2001
; Price et al. 2003
; Scott and Johnsrude 2003
). Furthermore, activation in multiple STS regions has been shown for visuoauditory integration and congruency of 1) seen mouth movements and heard speech during speech reading (Calvert et al. 2000
; Macaluso et al. 2004
) as well as 2) spoken and written letters (van Atteveldt et al. 2004
). Interestingly, when presenting written and spoken phonemes synchronously during passive listening and viewing, only congruent auditory-visual speech that allows successful binding is associated with increased STS activation relative to unimodal speech (Calvert et al. 2000
; van Atteveldt et al. 2004
). In contrast, in our visuoauditory priming paradigm, the task-irrelevant but incongruent visual prime induces behavioral interference and increases STS activation for categorization of the subsequent spoken word. This demonstrates that different multisensory paradigms can point to the same anatomical locus, but nevertheless be very distinct in their multisensory interaction. During a passive viewing–listening task, both stimulus components are task relevant enabling integration into a coherent percept. In contrast, visuoauditory priming can be considered a selective attention task, where task-irrelevant incongruent visual information needs to be suppressed or overcome by amplification of the task-relevant auditory information. Collectively, both types of visuoauditory interaction effects suggest that the anterior MTG/STS region may be the locus of neuronal processes that underpin visuoauditory interactions or incongruency effects that are conveyed phonologically.
In the AG/IPS, the incongruency effect was selective for categorization of environmental sounds. As recognition and categorization of environmental sounds is accomplished through interactions between perceptual and semantic processing, this speaks to a role in congruency primarily at the level of semantic representations and converges with recent functional imaging results implicating the IPS in semantic rather than response conflict (van Veen and Carter 2005
). Furthermore, the AG/IPS is part of a frontotemporoparietal semantic retrieval system that is generally activated for semantic relative to perceptual or phonological tasks (Vandenberghe et al. 1996
; Noppeney and Price 2003
, 2004
; Binder et al. 2005
; Sabsevitz et al. 2005
). However, its role in semantic processing has been elusive. Thus, only large but not focal parietal lesions are associated with semantic retrieval deficits as measured by standard neuropsychological tests (e.g., Pyramids and Palm Tree test, Alexander et al. 1989
). One interesting possibility that arises from our findings is that the AG/IPS is involved in controlling, accessing, and combining semantic information from multiple senses. This hypothesis needs to be investigated further by 1) comparing visuoauditory priming to unimodal (the limited number of unimodal trials in our experiment did not allow that comparison) and audiovisual (i.e., auditory prime and visual target) priming using fMRI and 2) testing patients with left-lateralized parietal lesions on nonverbal cross-modal (e.g., sound–picture) matching or priming tasks.
In contrast to STS/MTG and AG/IPS, activation in mPFC/AC and left IFS was increased for both, incongruent sounds and spoken words. The behavioral relevance of these effects was highlighted by their significant correlations with subjects' increases in RT and decreases in accuracy for incongruent relative to congruent trials. In other words, subjects spending relatively more time on and showing more accuracy reductions for incongruent trials, exhibit strong mPFC/AC and IFS incongruency effects. These results extend the role of the mPFC/AC—IFS circuitry in cognitive control processes such as conflict monitoring or predicting error probability to the multisensory domain (Duncan and Owen 2000
; Botvinick et al. 2001
; Paus 2001
; Noppeney and Price 2002
; Laurienti et al. 2003
; Kerns et al. 2004
; Brown and Braver 2005
). Thus, the AC/mPFC and IFS may be engaged in evaluating and integrating higher level conceptual information abstracted from different stimulus materials (i.e., verbal vs. nonverbal) and modalities (auditory vs. visual).
The multiple incongruency effects raise the question at which level of the cortical hierarchy they emerge (Mesulam 1990
; McIntosh 2000
; Horwitz 2003
). More specifically, are the incongruency effects mediated via sensitization to forward connections from early auditory regions to STS/MTG and AG/IPS, or do they emerge through greater top-down influences from the AC–IFS circuitry indicating increased cognitive control? Bayesian model comparison provided strong evidence for the bottom-up model where the influence of early auditory regions on STS and AG/IPS is increased during incongruent trials. Critically, the forward connectivity is selectively modulated by the different classes of incongruency: phonological incongruency increases the forward connections to STS/MTG, semantic incongruency to AG/IPS. The proposed bottom-up mechanism converges with recent electroencephalography results demonstrating auditory-visual incongruency and category-specific effects for tools and animals as early as 100 ms poststimulus (Molholm et al. 2004
; Hauk et al. 2006
; Murray et al. 2006
). Hence, the human brain may be able to distinguish rapidly between tools and animals and detect higher level incongruencies between auditory and visual stimulus components. Collectively, these results are consistent with predictive coding hypotheses, in which prediction errors at all levels of a cortical hierarchy guide perceptual inference (Rao and Ballard 1999
). In our case, the failure to suppress prediction error, in the context of unpredictable or incongruent bottom-up cross-modal inputs, is manifest as an increase in forward connectivity. Furthermore, the nature of the prediction error (phonological vs. semantic) determines where it is expressed (MTG/STS vs. AG/IPS).
Our findings suggest that prior visual information influences the neural processes underlying speech and sound recognition at multiple levels with the left anterior MTG/STS being involved in phonological, the AG/IPS in semantic, and the mPFC/IFS in higher conceptual processes. In terms of neural mechanisms, effective connectivity analyses indicate that the incongruency effects emerge via a failure to suppress incongruent, bottom-up inputs from early auditory regions to MTG/STS or AG/IPS. This is consistent with a predictive coding perspective on hierarchical Bayesian inference in the cortex where the domain of the prediction error (phonological vs. semantic) determines its regional expression (MTG/STS vs. AG/IPS).
| Funding |
|---|
|
|
|---|
The Deutsche Forschungsgemeinschaft; the Max-Planck Society; and the Wellcome Trust.
| Appendix |
|---|
|
|
|---|
Example trials: Each stimulus (e.g., bear) was presented 4 times as a target in congruent pairs and 4 times as a target in incongruent pairs. Across subjects, each stimulus was counterbalanced across stimulus modalities and conditions.
| Acknowledgments |
|---|
Conflict of Interest: None declared.
| References |
|---|
|
|
|---|
Alexander MP, Hiltbrunner B, Fischer RS. Distributed anatomy of transcortical aphasia. Arch Neurol (1989) 46:885–892.
Amedi A, von Kriegstein K, van Atteveldt NM, Beauchamp MS, Naumer MJ. Functional imaging of human crossmodal identification and object recognition. Exp Brain Res (2005) 166:559–571.[CrossRef][Web of Science][Medline]
Badgaiyan RD, Schacter DL, Alpert NM. Auditory priming within and across modalities: evidence from positron emission tomography. J Cogn Neurosci (1999) 11:337–348.[CrossRef][Web of Science][Medline]
Barraclough NE, Xiao D, Baker CI, Oram MW, Perrett DI. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. J Cogn Neurosci (2005) 17:377–391.[CrossRef][Web of Science][Medline]
Beauchamp MS, Argall BD, Bodurka J, Duyn JH, Martin A. Unraveling multisensory integration: patchy organization within human STS multisensory cortex. Nat Neurosci (2004) 7:1190–1192.[CrossRef][Web of Science][Medline]
Beauchamp MS, Lee KE, Argall BD, Martin A. Integration of auditory and visual information about objects in superior temporal sulcus. Neuron (2004) 41:809–823.[CrossRef][Web of Science][Medline]
Binder JR, Frost JA, Hammeke TA, Bellgowan PS, Springer JA, Kaufman JN, Possing ET. Human temporal lobe activation by speech and nonspeech sounds. Cereb Cortex (2000) 10:512–528.
Binder JR, Liebenthal E, Possing ET, Medler DA, Ward BD. Neural correlates of sensory and decision processes in auditory object identification. Nat Neurosci (2004) 7:295–301.[CrossRef][Web of Science][Medline]
Binder JR, Westbury CF, McKiernan KA, Possing ET, Medler DA. Distinct brain systems for processing concrete and abstract concepts. J Cogn Neurosci (2005) 17:905–917.[CrossRef][Web of Science][Medline]
Botvinick MM, Braver TS, Barch DM, Carter CS, Cohen JD. Conflict monitoring and cognitive control. Psychol Rev (2001) 108:624–652.[CrossRef][Web of Science][Medline]
Brown JW, Braver TS. Learned predictions of error likelihood in the anterior cingulate cortex. Science (2005) 307:1118–1121.
Callan DE, Jones JA, Munhall K, Kroos C, Callan AM, Vatikiotis-Bateson E. Multisensory integration sites identified by perception of spatial wavelet filtered visual speech gesture information. J Cogn Neurosci (2004) 16:805–816.[CrossRef][Web of Science][Medline]
Calvert GA. Crossmodal processing in the human brain: insights from functional neuroimaging studies. Cereb Cortex (2001) 11:1110–1123.
Calvert GA, Brammer MJ, Bullmore ET, Campbell R, Iversen SD, David AS. Response amplification in sensory-specific cortices during crossmodal binding. Neuroreport (1999) 10:2619–2623.[Web of Science][Medline]
Calvert GA, Campbell R, Brammer MJ. Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Curr Biol (2000) 10:649–657.[CrossRef][Web of Science][Medline]
Calvert GA, Hansen PC, Iversen SD, Brammer MJ. Detection of audio-visual integration sites in humans by application of electrophysiological criteria to the BOLD effect. Neuroimage (2001) 14:427–438.[CrossRef][Web of Science][Medline]
Calvert GA, Lewis JW. Hemodynamic studies of audio-visual interactions. In: The handbook of multi-sensory processes—Calvert GA, Spence C, Stein BE, eds. (2004) Cambridge (MA): MIT press. 483–502.
Chao LL, Haxby JV, Martin A. Attribute-based neural substrates in temporal cortex for perceiving and knowing about objects. Nat Neurosci (1999) 2:913–919.[CrossRef][Web of Science][Medline]
Duncan J, Owen AM. Common regions of the human frontal lobe recruited by diverse cognitive demands. Trends Neurosci (2000) 23:475–483.[CrossRef][Web of Science][Medline]
Evans AC, Collins DL, Milner B. An MRI-based stereotactic atlas from 250 young normal subjects. Soc Nuerosci Abstr. (1992).
Foxe JJ, Morocz IA, Murray MM, Higgins BA, Javitt DC, Schroeder CE. Multisensory auditory-somatosensory interactions in early cortical processing revealed by high-density electrical mapping. Brain Res Cogn Brain Res (2000) 10:77–83.[CrossRef][Medline]
Friston KJ, Harrison L, Penny W. Dynamic causal modelling. Neuroimage (2003) 19:1273–1302.[CrossRef][Web of Science][Medline]
Friston KJ, Holmes A, Worsley KJ, Poline JB, Frith CD, Frackowiak R. Statistical parametric mapping: a general linear approach. Hum Brain Mapp (1995) 2:189–210.[Medline]
Friston KJ, Holmes AP, Price CJ, Buchel C, Worsley KJ. Multisubject fMRI studies and conjunction analyses. Neuroimage (1999) 10:385–396.[CrossRef][Web of Science][Medline]
Friston KJ, Penny WD, Glaser DE. Conjunction revisited. Neuroimage (2005) 25:661–667.[CrossRef][Web of Science][Medline]
Friston KJ, Price CJ. Generative models, brain function and neuroimaging. Scand J Psychol (2001) 42:167–177.[CrossRef][Web of Science][Medline]
Fu KM, Johnston TA, Shah AS, Arnold L, Smiley J, Hackett TA, Garraghty PE, Schroeder CE. Auditory cortical neurons respond to somatosensory stimulation. J Neurosci (2003) 23:7510–7515.
Fuster JM, Bodner M, Kroger JK. Cross-modal and cross-temporal association in neurons of frontal cortex. Nature (2000) 405:347–351.[CrossRef][Medline]
Ghazanfar AA, Maier JX, Hoffman KL, Logothetis NK. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. J Neurosci (2005) 25:5004–5012.
Ghazanfar AA, Schroeder CE. Is neocortex essentially multisensory? Trends Cogn Sci (2006) 10:278–285.[CrossRef][Web of Science][Medline]
Gibson JR, Maunsell JH. Sensory modality specificity of neural activity related to memory in visual cortex. J Neurophysiol (1997) 78:1263–1275.
Giraud AL, Price CJ. The constraints functional neuroimaging places on classical models of auditory word processing. J Cogn Neurosci (2001) 13:754–765.[CrossRef][Web of Science][Medline]
Gonzalo D, Büchel C. Crossmodal associative learning modulates fusiform face's areas response to sound (2003) Geneva: 3rd Annual Meeting International Multi-Sensory Research Forum.
Gottfried JA, Dolan RJ. The nose smells what the eye sees: crossmodal visual facilitation of human olfactory perception. Neuron (2003) 39:375–386.[CrossRef][Web of Science][Medline]
Gottfried JA, Smith AP, Rugg MD, Dolan RJ. Remembrance of odors past: human olfactory cortex in cross-modal recognition memory. Neuron (2004) 42:687–695.[CrossRef][Web of Science][Medline]
Grill-Spector K, Henson R, Martin A. Repetition and the brain: neural models of stimulus-specific effects. Trends Cogn Sci (2006) 10:14–23.[CrossRef][Web of Science][Medline]
Hauk O, Shtyrov Y, Pulvermuller F. The sound of actions as reflected by mismatch negativity: rapid activation of cortical sensory-motor networks by sounds associated with finger and tongue movements. Eur J Neurosci (2006) 23:811–821.[CrossRef][Web of Science][Medline]
Henson RN. Neuroimaging studies of priming. Prog Neurobiol (2003) 70:53–81.[CrossRef][Web of Science][Medline]
Henson RN, Rugg MD. Neural response suppression, haemodynamic repetition effects, and behavioural priming. Neuropsychologia (2003) 41:263–270.[CrossRef][Web of Science][Medline]
Horwitz B. The elusive concept of brain connectivity. Neuroimage (2003) 19:466–470.[CrossRef][Web of Science][Medline]
Humphreys GW, Forde EM. Hierarchies, similarity, and interactivity in object recognition: "category-specific" neuropsychological deficits. Behav Brain Sci (2001) 24:453–476.[Web of Science][Medline]
Ikeda M, Patterson K, Graham KS, Ralph MA, Hodges JR. A horse of a different colour: do patients with semantic dementia recognise different versions of the same object as the same? Neuropsychologia (2006) 44:566–575.[CrossRef][Web of Science][Medline]
Josephs O, Deichmann R, Turner R. Trajectory measurement and generalized reconstruction in rectilinear EPI. ISMRM Meeting (2000) 151:7.
Kass RE, Raftery AE. Bayes factors. J Am Stat Assoc (1995) 90:773–795.[CrossRef][Web of Science]
Kayser C, Petkov CI, Augath M, Logothetis NK. Integration of touch and sound in auditory cortex. Neuron (2005) 48:373–384.[CrossRef][Web of Science][Medline]
Kerns JG, Cohen JD, MacDonald AW 3rd, Cho RY, Stenger VA, Carter CS. Anterior cingulate conflict monitoring and adjustments in control. Science (2004) 303:1023–1026.
Laurienti PJ, Kraft RA, Maldjian JA, Burdette JH, Wallace MT. Semantic congruence is a critical factor in multisensory behavioral performance. Exp Brain Res (2004) 158:405–414.[Web of Science][Medline]
Laurienti PJ, Wallace MT, Maldjian JA, Susi CM, Stein BE, Burdette JH. Cross-modal sensory processing in the anterior cingulate and medial prefrontal cortices. Hum Brain Mapp (2003) 19:213–223.[CrossRef][Web of Science][Medline]
Lehmann S, Murray MM. The role of multisensory memories in unisensory object discrimination. Brain Res Cogn Brain Res (2005) 24:326–334.[CrossRef][Medline]
Lewis JW, Brefczynski JA, Phinney RE, Janik JJ, DeYoe EA. Distinct cortical pathways for processing tool versus animal sounds. J Neurosci (2005) 25:5148–5158.
Lewis JW, Wightman FL, Brefczynski JA, Phinney RE, Binder JR, DeYoe EA. Human brain regions involved in recognizing environmental sounds. Cereb Cortex (2004) 14:1008–1021.
Macaluso E, George N, Dolan R, Spence C, Driver J. Spatial and temporal factors during processing of audiovisual speech: a PET study. Neuroimage (2004) 21:725–732.[CrossRef][Web of Science][Medline]
McIntosh AR. Towards a network theory of cognition. Neural Netw (2000) 13:861–870.[CrossRef][Web of Science][Medline]
Mesulam MM. Large-scale neurocognitive networks and distributed processing for attention, language, and memory. Ann Neurol (1990) 28:597–613.[CrossRef][Web of Science][Medline]
Molholm S, Ritter W, Javitt DC, Foxe JJ. Multisensory visual-auditory object recognition in humans: a high-density electrical mapping study. Cereb Cortex (2004) 14:452–465.
Molholm S, Ritter W, Murray MM, Javitt DC, Schroeder CE, Foxe JJ. Multisensory auditory-visual interactions during early sensory processing in humans: a high-density electrical mapping study. Brain Res Cogn Brain Res (2002) 14:115–128.[CrossRef][Medline]
Moscoso del Prado MF, Hauk O, Pulvermuller F. Category specificity in the processing of color-related and form-related words: an ERP study. Neuroimage (2006) 29:29–37.[Web of Science][Medline]
Mummery CJ, Ashburner J, Scott SK, Wise RJ. Functional neuroimaging of speech perception in six normal and two aphasic subjects. J Acoust Soc Am (1999) 106:449–457.[CrossRef][Web of Science][Medline]
Murray MM, Camen C, Gonzalez Andino SL, Bovet P, Clarke S. Rapid brain discrimination of sounds of objects. J Neurosci (2006) 26:1293–1302.
Murray MM, Foxe JJ, Wylie GR. The brain uses single-trial ultisensory memories to discriminate without awareness. Neuroimage (2005) 27:473–478.[CrossRef][Web of Science][Medline]
Neely JH. Semantic priming and retrieval from lexical memory: roles of inhibitionless spreading activation and limited-capacity attention. J Exp Psycho Gen (1977) 106:226–254.[CrossRef]
Noppeney U, Price C. Retrieval of abstract semantics. Neuroimage (2004) 22:164–170.[CrossRef][Web of Science][Medline]
Noppeney U, Price CJ. A PET study of stimulus- and task-induced semantic processing. Neuroimage (2002) 15:927–935.[CrossRef][Web of Science][Medline]
Noppeney U, Price CJ. Functional imaging of the semantic system: retrieval of sensory-experienced and verbally-learnt knowledge. Brain Lang (2003) 84:120–133.[CrossRef][Web of Science][Medline]
Noppeney U, Price CJ, Penny WD, Friston KJ. Two distinct neural mechanisms for category-selective responses. Cereb Cortex (2006) 16:437–445.
Nyberg L, Habib R, McIntosh AR, Tulving E. Reactivation of encoding-related brain activity during memory retrieval. Proc Natl Acad Sci USA (2000) 97:11120–11124.
Olson IR, Gatenby JC, Gore JC. A comparison of bound and unbound audio-visual information processing in the human cerebral cortex. Brain Res Cogn Brain Res (2002) 14:129–138.[CrossRef][Medline]
Paus T. Primate anterior cingulate cortex: where motor control, drive and cognition interface. Nat Rev Neurosci (2001) 2:417–424.[CrossRef][Web of Science][Medline]
Penny WD, Stephan KE, Mechelli A, Friston KJ. Comparing dynamic causal models. Neuroimage (2004) 22:1157–1172.[CrossRef][Web of Science][Medline]
Plaut DC, McClelland JL, Seidenberg MS, Patterson K. Understanding normal and impaired word reading: computational principles in quasi-regular domains. Psychol Rev (1996) 103:56–115.[CrossRef][Web of Science][Medline]
Potter MC, Faulconer BA. Time to understand pictures and words. Nature (1975) 253:437–438.[CrossRef][Medline]
Price CJ, Winterburn D, Giraud AL, Moore CJ, Noppeney U. Cortical localisation of the visual and auditory word form areas: a reconsideration of the evidence. Brain Lang (2003) 86:272–286.[CrossRef][Web of Science][Medline]
Rahman RA, van Turennout M, Levelt WJ. Phonological encoding is not contingent on semantic feature retrieval: an electrophysiological study on object naming. J Exp Psychol Learn Mem Cogn (2003) 29:850–860.[CrossRef][Web of Science][Medline]
Raij T, Uutela K, Hari R. Audiovisual integration of letters in the human brain. Neuron (2000) 28:617–625.[CrossRef][Web of Science][Medline]
Rao RP, Ballard DH. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci (1999) 2:79–87.[CrossRef][Web of Science][Medline]
Rogers TT, Lambon Ralph MA, Garrard P, Bozeat S, McClelland JL, Hodges JR, Patterson K. Structure and deterioration of semantic memory: a neuropsychological and computational investigation. Psychol Rev (2004) 111:205–235.[CrossRef][Web of Science][Medline]
Sabsevitz DS, Medler DA, Seidenberg M, Binder JR. Modulation of the semantic system by word imageability. Neuroimage (2005) 27:188–200.[CrossRef][Web of Science][Medline]
Saito DN, Yoshimura K, Kochiyama T, Okada T, Honda M, Sadato N. Cross-modal binding and activated attentional networks during audio-visual speech integration: a functional MRI study. Cereb Cortex (2005) 15:1750–1760.
Schiller NO, Bles M, Jansma BM. Tracking the time course of phonological encoding in speech production: an event-related brain potential study. Brain Res Cogn Brain Res (2003) 17:819–831.[CrossRef][Medline]
Schroeder CE, Foxe J. Multisensory contributions to low-level, unisensory processing. Curr Opin Neurobiol (2005) 15:454–458.[CrossRef][Web of Science][Medline]
Schroeder CE, Foxe JJ. The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Brain Res Cogn Brain Res (2002) 14:187–198.[CrossRef][Medline]
Scott SK, Blank CC, Rosen S, Wise RJ. Identification of a pathway for intelligible speech in the left temporal lobe. Brain (2000) 123:2400–2406. Pt 12.
Scott SK, Johnsrude IS. The neuroanatomical and functional organization of speech perception. Trends Neurosci (2003) 26:100–107.[CrossRef][Web of Science][Medline]
Stein BE, Meredith MA. Merging of the senses. (1993) Cambridge (MA): MIT Press.
Talairach J, Tournoux P. Co-planar stereotaxic atlas of the human brain. (1988) Stuttgart (Germany): Thieme.
Tanabe HC, Honda M, Sadato N. Functionally segregated neural substrates for arbitrary audiovisual paired-association learning. J Neurosci (2005) 25:6409–6418.
Taylor KI, Moss HE, Stamatakis EA, Tyler LK. Binding crossmodal object features in perirhinal cortex. Proc Natl Acad Sci USA (2006) 103:8239–8244.
van Atteveldt N, Formisano E, Goebel R, Blomert L. Integration of letters and speech sounds in the human brain. Neuron (2004) 43:271–282.[CrossRef][Web of Science][Medline]
van Veen V, Carter CS. Separating semantic conflict and response conflict in the Stroop task: a functional MRI study. Neuroimage (2005) 27:497–504.[CrossRef][Web of Science][Medline]
Vandenberghe R, Price C, Wise R, Josephs O, Frackowiak RS. Functional anatomy of a common semantic system for words and pictures [see comments]. Nature (1996) 383:254–256.[CrossRef][Medline]
Wright TM, Pelphrey KA, Allison T, McKeown MJ, McCarthy G. Polysensory interactions along lateral temporal regions evoked by audiovisual speech. Cereb Cortex (2003) 13:1034–1043.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Sadaghiani, J. X. Maier, and U. Noppeney Natural, Metaphoric, and Linguistic Auditory Direction Signals Have Distinct Influences on Visual Motion Processing J. Neurosci., May 20, 2009; 29(20): 6490 - 6499. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





