Skip Navigation


Cerebral Cortex Advance Access originally published online on July 7, 2007
Cerebral Cortex 2008 18(3):598-609; doi:10.1093/cercor/bhm091
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
18/3/598    most recent
bhm091v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Noppeney, U.
Right arrow Articles by Friston, K. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Noppeney, U.
Right arrow Articles by Friston, K. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

The Effect of Prior Visual Information on Recognition of Speech and Sounds

Uta Noppeney1,2, Oliver Josephs2, Julia Hocking2, Cathy J. Price2 and Karl J. Friston2

1 Max-Planck-Institute for Biological Cybernetics, Tuebingen, Germany, 2 Wellcome Department of Imaging Neuroscience, Institute of Neurology, London, UK

Address correspondence to Uta Noppeney, Max-Planck-Institute for Biological Cybernetics, Spemannstrasse 38, 72076 Tubingen, Germany. Email: uta.noppeney{at}tuebingen.mpg.de.


    Abstract
 Top
 Abstract
 Introduction
 Mateirals and Methods
 Results
 Discussion
 Funding
 Appendix
 References
 
To identify and categorize complex stimuli such as familiar objects or speech, the human brain integrates information that is abstracted at multiple levels from its sensory inputs. Using cross-modal priming for spoken words and sounds, this functional magnetic resonance imaging study identified 3 distinct classes of visuoauditory incongruency effects: visuoauditory incongruency effects were selective for 1) spoken words in the left superior temporal sulcus (STS), 2) environmental sounds in the left angular gyrus (AG), and 3) both words and sounds in the lateral and medial prefrontal cortices (IFS/mPFC). From a cognitive perspective, these incongruency effects suggest that prior visual information influences the neural processes underlying speech and sound recognition at multiple levels, with the STS being involved in phonological, AG in semantic, and mPFC/IFS in higher conceptual processing. In terms of neural mechanisms, effective connectivity analyses (dynamic causal modeling) suggest that these incongruency effects may emerge via greater bottom-up effects from early auditory regions to intermediate multisensory integration areas (i.e., STS and AG). This is consistent with a predictive coding perspective on hierarchical Bayesian inference in the cortex where the domain of the prediction error (phonological vs. semantic) determines its regional expression (middle temporal gyrus/STS vs. AG/intraparietal sulcus).

Key Words: cross-modal priming • dynamic causal modeling • effective connectivity • multisensory integration • predictive coding • semantics


    Introduction
 Top
 Abstract
 Introduction
 Mateirals and Methods
 Results
 Discussion
 Funding
 Appendix
 References
 
To form a coherent and unified percept, the human brain combines information from multiple senses (Stein and Meredith 1993Go). At the behavioral level, multisensory integration of congruent information facilitates detection, identification, and categorization of objects or novel events in our environment. Electrophysiological and functional magnetic resonance imaging (fMRI) studies in human and nonhuman primates have started investigating where, when, and how the human brain integrates different types of sensory information at multiple levels of the cortical hierarchy. Multisensory convergence effects have been found in a distributed subcortical and cortical neural system encompassing presumptive unimodal (or early) sensory areas (Calvert et al. 1999Go; Foxe et al. 2000Go; Molholm et al. 2002Go; Schroeder and Foxe 2002Go; Fu et al. 2003Go; Kayser et al. 2005Go) and higher order association areas such as the superior temporal sulcus and intraparietal sulcus (IPS), the anterior cingulate (AC), and the prefrontal cortex (for review, see Calvert and Lewis 2004Go; Amedi et al. 2005Go; Schroeder and Foxe 2005Go; Ghazanfar and Schroeder 2006Go). It has been proposed that these various integration sites may support the integration of different stimulus features or parameters that are abstracted at multiple levels from the sensory inputs. In particular, recognition of complex audiovisual stimuli such as familiar objects (Gottfried and Dolan 2003Go; Laurienti et al. 2003Go, 2004Go; Molholm et al. 2004Go; Beauchamp, Argall, et al. 2004; Beauchamp, Lee, et al. 2004), actions (Barraclough et al. 2005Go), or speech (Calvert et al. 2000Go; Raij et al. 2000Go; Olson et al. 2002Go; Wright et al. 2003Go; Callan et al. 2004Go; Macaluso et al. 2004Go; van Atteveldt et al. 2004Go; Ghazanfar et al. 2005Go; Saito et al. 2005Go) may involve audiovisual integration at multiple processing stages ranging from early sensory to phonological, semantic, and higher conceptual (or decisional) processes.

Multiple different experimental paradigms and analyses have been used to characterize audiovisual interactions. Classically, multisensory integration areas have been identified by superimposition of auditory and visual activations (e.g., using implicit masking or conjunction analyses, Friston et al. 2005Go), audiovisual interaction effects, and congruency manipulations (Calvert 2001Go; Calvert et al. 2001Go). Complementary insights into the variety of audiovisual interactions have been obtained from visuoauditory matching (Taylor et al. 2006Go), recognition (Nyberg et al. 2000Go; Gottfried et al. 2004Go; Lehmann and Murray 2005Go; Murray et al. 2005Go), association learning (Gibson and Maunsell 1997Go; Fuster et al. 2000Go; Gonzalo and Büchel 2003; Tanabe et al. 2005Go), and priming (Badgaiyan et al. 1999Go) paradigms: Despite a degree of convergence in the results, these diverse paradigms are likely to highlight distinct aspects of multisensory processes: Thus, visuoauditory matching tasks require explicit access to unimodal percepts, multisensory interactions involve the integration of sensory features into a unified percept, and recognition paradigms invoke additional memory components.

In the current study, we employed immediate visuoauditory priming (for review, see Henson 2003Go; Henson and Rugg 2003Go; Grill-Spector et al. 2006Go) to investigate the effect of prior visual information on categorization of complex stimuli such as environmental sounds and spoken words in terms of behavioral interference/facilitation and associated activation changes.

Categorization of spoken words and environmental sounds (=source sounds, e.g., cat's meowing) both engage phonological, semantic, and higher conceptual processes. However, they do so to different degrees: recognition and categorization of spoken words or speech (i.e., verbal stimuli) relies primarily on the interaction between perceptual and phonological processes, that is, processing of speech sounds (Potter and Faulconer 1975Go; Plaut et al. 1996Go; Binder et al. 2000Go, 2004Go). By contrast, recognition and categorization of environmental sounds (i.e., nonverbal stimuli) is accomplished primarily through interaction of perceptual and semantic processes (Humphreys and Forde 2001Go; Lewis et al. 2004Go; Rogers et al. 2004Go; Ikeda et al. 2006Go). However, this is a continuous rather than categorical distinction. For instance, auditory word recognition may also activate semantic representations related to the meaning of the word. Conversely, sound object recognition may involve implicit name retrieval. Furthermore, categorization of sounds or words will involve higher level conceptual or decisional processes that do not depend on the particular stimulus format but are elicited irrespective of stimulus material (verbal, nonverbal) or modality (auditory, visual, etc.).

Incongruent prior visual information will interfere with and thus place more demands on the processes involved in sound and speech recognition. Hence, categorization of auditory stimuli that are preceded by incongruent visual stimuli may be associated primarily with phonological incongruency for spoken words (e.g., the spoken word cat) and semantic incongruency for sounds (e.g., the meowing sound of a cat). Both, incongruent spoken words and sounds may elicit higher level conceptual (or decisional) incongruency effects.

Combining visuoauditory priming for environmental sounds and spoken words may thus enable us to dissociate visuoauditory incongruency effects that may emerge at the phonological, semantic, and higher conceptual level. At the neuronal level, these incongruency effects are thought to be associated with activation increases for incongruent trials—possibly reflecting a prediction error signal (Rao and Ballard 1999Go; Friston and Price 2001Go)—in regions sustaining phonological, semantic, or higher conceptual processes.

These differential contributions of phonology, semantics, and conceptual (or decision) elements to categorization of sounds and spoken words provide the rationale for our visuoauditory priming paradigm: subjects were presented with a brief (100 ms) visual prime (i.e., a picture or a written word) that was followed by a congruent or incongruent auditory target (i.e., a sound or a spoken word) after an additional 100 ms. Both, visual primes and auditory targets could either be verbal (i.e., written words and spoken words) or nonverbal (i.e., sounds and pictures). Subjects passively attended the visual prime and categorized the auditory targets, that is, the spoken words and sounds according to their weight (heavier than 4 kg?).

Using this fully balanced multifactorial design (see Fig. 1), we first identified regions that were influenced by visuoauditory (in)congruency. Within these regions, we investigated whether the (in)congruency effects depended on the target material and were different for spoken words and sounds. This allowed us to segregate incongruency effects into 3 classes: visuoauditory incongruency effects that were 1) selective for spoken words, 2) selective for sounds, or 3) common to spoken words and sounds. Following our initial rationale, we related these 3 types of visuoauditory incongruency effects to multisensory interactions at the 1) phonological, 2) semantic, and 3) conceptual/decisional level.


Figure 1
View larger version (39K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Study design and example stimuli. (A) 2 x 2 x 3 factorial design with the factors:
  1. Congruency: (C) congruent identity and response (e.g., cat paired with meow), (II) incongruent identity and congruent response (e.g., razor paired with bee, both items < 4 kg), (II+R) incongruent identity and incongruent response (e.g., car paired with owl, only one item < 4 kg),
  2. Prime material: written words, pictures, and
  3. Target material: spoken words, sounds.

(B) Example run and timing of 3 trials from the 3 levels of (in)congruency.

 
Using dynamic causal modeling (DCM; Friston et al. 2003Go) with Bayesian model selection (Penny et al. 2003), we then investigated the neural mechanisms underlying these visuoauditory incongruency effects. In particular, we asked whether the incongruency effects can be better understood as a bottom-up error signal or as top-down effects from a general "cognitive control device." Hence, we compared 2 alternative models that implemented these 2 competing neural mechanisms in a 3-level cortical hierarchy: In the first bottom-up model, the incongruency effects emerge in a material-dependent fashion (i.e., selective for sounds or spoken words) via changes in forward connections from early auditory to intermediate multisensory areas. This model embodies the idea of predictive coding, whereby the human brain learns to predict stimulus attributes on successive exposures to congruent stimuli (=priming) and fails to suppress prediction error in the context of unpredictable or incongruent bottom-up visuoauditory input which is manifest in an increase in forward connectivity. In the second top-down model, the incongruency effects emerge irrespective of stimulus material through interactions among higher cognitive control regions and propogate down the cortical hierarchy to lower areas. Here, higher cognitive control regions such as the AC/medial prefrontal cortex (mPFC) and the lateral prefrontal cortex (IFS) may act as a general "conflict monitoring and cognitive control device" (Duncan and Owen 2000Go; Botvinick et al. 2001Go; Paus 2001Go; Kerns et al. 2004Go; Brown and Braver 2005Go) that modulates activation in intermediate multisensory convergence areas.


    Mateirals and Methods
 Top
 Abstract
 Introduction
 Mateirals and Methods
 Results
 Discussion
 Funding
 Appendix
 References
 
Subjects

Seventeen healthy right-handed English native speakers (5 females, median age 25) gave informed consent to participate in the study. The study was approved by the joint ethics committee of the Institute of Neurology and University College London Hospital, London, UK.

Experimental Design

The paradigm was a 2-choice forced semantic categorization of auditory stimuli that were preceded by visual stimuli. The activation conditions conformed to a 3 x 2 x 2 factorial design manipulating

  1. Congruency (3 levels): 1) congruent identity and response (=congruent), 2) incongruent identity and congruent response (=incongruencyI), 3) incongruent identity and incongruent response (=incongruencyI+R),
  2. Prime material (2 levels): written words, pictures (i.e., verbal vs. nonverbal), and
  3. Target material (2 levels): spoken words, sounds (i.e., verbal vs. nonverbal).

At the beginning of each trial, a visual prime (i.e., written words or color pictures) was presented for 100 ms followed by the auditory target (i.e., spoken words or sounds) after additional 100 ms. A very short prime–target asynchrony (200 ms) was selected because we were interested in automatic priming and aimed to reduce any strategic components (see Neely 1977Go). This rapid subsequent presentation was perceived as "nearly synchronous" by subjects. The trial onset asynchrony was 3.25 s. Subjects passively attended to the visual primes and performed a semantic decision on the auditory targets (Is the target stimulus heavier than 4 kg?). Fifty percent of the stimuli weighed more than 4 kg and 50% weighed less. Altogether, there were 64 stimuli: 32 animals and 32 tools (length, mean + standard deviation, of sounds: 0.8 ± 0.2 s; spoken words: 0.76 ± 0.2 s). These 2 distinct categories were selected to enable incongruent pairings between categories and thus induce strong and reliable incongruency effects. Therefore, category-selective activations that have been characterized by numerous previous studies (Chao et al. 1999Go; Lewis et al. 2004Go, 2005Go; Noppeney et al. 2006Go) are difficult to evaluate (i.e., half of the compound trials are mixtures of both categories) and not the focus of this communication.

Fifty percent of the trials were identity congruent, that is, prime and target referred to the same object (e.g., a picture of a dog followed by the barking sound of a dog). The remaining 50% of trials were identity incongruent (i.e., visual prime and auditory target referred to different stimuli). In half of the identity incongruent trials (i.e., 25% of the total trials), both, prime and target, weighed less than 4 kg or both weighed more than 4 kg (e.g., a picture of an elephant followed by the sound of a car). In the other half of the identity incongruent trials (i.e., 25% of the total trials), only one of the objects weighed more (or less) than 4 kg (e.g., a picture of a fly followed by the sound of a car). In summary, 50% of the trials were identity congruent, 25% identity incongruent and response congruent, and 25% identity incongruent and response incongruent. This allowed us to dissociate the effect of identity incongruency from response incongruency.

Each stimulus (e.g., bear, see Appendix) was presented 16 times, 8 times as prime (i.e., 4 times as picture and 4 times as written word) and 8 times as target (i.e., 4 times as sound and 4 times as spoken word), amounting to 512 cross-modal trials (i.e., 64 x 8 = 512 trials). In the congruent trials, each stimulus was presented once in each of the following pairings: 1) written word–spoken word, 2) written word–sound, 3) picture–spoken word, and 4) picture–sound. Similarly, in the incongruent trials, each stimulus was equally often presented in each modality pairing. However, here a target stimulus (e.g., bear) was presented with 4 different primes (see Appendix). Presenting the stimuli only once in each pairing and thus changing the surface features ensured that subjects did not engage in prime–target association learning. Furthermore, it ensured that the stimuli were rotated and fully counterbalanced across conditions within and between subjects.

Additionally, 48 intramodal visual trials (i.e., picture–picture, picture–written word, written word–picture, written word–written word) were included to maintain subjects' attention to the visual primes that were response irrelevant. Fifty percent of the trials required a yes response. Yes/no responses to all conditions were indicated (as quickly and as accurately as possible) by a 2-choice key press. The activation conditions were interleaved with 6 s fixation. The stimuli and order of conditions were randomized.

Functional Magnetic Resonance Imaging

A 3-T Siemens Allegra system was used to acquire both T1 anatomical volume images and T2*-weighted axial echo-planar images with blood oxygenation level–dependent contrast (GE-EPI, Cartesian k-space sampling, time echo = 30 ms, time repetition = 2.47 s, 38 axial slices, acquired sequentially in descending direction, matrix 64 x 64, spatial resolution 3 x 3 x 3.4 mm3 voxels, interslice gap 1.4 mm, slice thickness 2.0 mm). To minimize Nyquist ghost artifacts, a generalized reconstruction algorithm was used for data processing (Josephs et al. 2000). There were 2 sessions with a total of 473 volume images per session. The first 6 volumes were discarded to allow for T1 equilibration effects (Table 1).


View this table:
[in this window]
[in a new window]

 
Table 1 Behavioural data: accuracy and reaction times

 
Conventional SPM Analysis

The data were analyzed with statistical parametric mapping (using SPM2 software from the Wellcome Department of Imaging Neuroscience, London; www.fil.ion.ucl.ac.uk/spm (Friston et al. 1995Go). Scans from each subject were realigned using the first as a reference, spatially normalized into Montreal Neurological Institute standard space (Talairach and Tournoux 1988Go; Evans et al. 1992), resampled to 3 x 3 x 3 mm3 voxels, and spatially smoothed with a Gaussian kernel of 8 mm full-width half-maxium. The time series in each voxel was high-pass filtered to 1/128 Hz and globally normalized with proportional scaling. The fMRI experiment was modeled in an event-related fashion using regressors obtained by convolving each event-related unit impulse with a canonical hemodynamic response function and its first temporal derivative. In addition to modeling the 12 conditions in our 2 x 2 x 3 factorial design (only correct trials included), the statistical model included intramodal trials, errors, and non-responses. Nuisance covariates included the realignment parameters (to account for residual motion artifacts). Condition-specific effects for each subject were estimated according to the general linear model and passed to a second-level analysis as contrasts. This involved creating contrast images averaged over all cross-modal conditions > fixation (averaged over the 2 sessions) for each subject and entering them into a second-level one-sample t-test. In addition, the response for each of the 12 conditions (summed over the 2 sessions) was estimated and entered into a second-level analysis of variance (ANOVA). This ANOVA modeled the 12 effects in our 2 x 2 x 3 factorial design.

Inferences were made at the second level to allow a random effects analysis and inferences at the population level (Friston et al. 1999Go).

The random effects ANOVA analysis tested for the effects of incongruency. Pooling over picture and written word primes, we tested for incongruency effects that were selective for 1) spoken words, 2) sounds (i.e., the interaction between congruent vs. incongruent and sounds vs. spoken words), or 3) common to sounds and spoken words.

Search Volume Constraints

The search space (i.e., volume of interest) was constrained using orthogonal contrasts: the search space for the main and simple main effects of (in)congruency was limited to voxels that were activated for cross-modal stimuli > fixation at a threshold of P < 0.01 uncorrected (extent threshold > 15 voxels). The search space for the interaction effects was limited to voxels that were activated for cross-modal stimuli > fixation at P < 0.01 uncorrected (extent threshold > 15 voxels) and exhibited a main effect of congruency (i.e., incongruent > congruent stimuli at P < 0.001, uncorrected; extent threshold > 15 voxels). To identify conceptual (or decisional) congruency effects that were common for sounds and spoken word targets, each effect was tested within a search volume mutually constrained by the other contrast (see Friston et al. 2005Go). This approach is equivalent to a (conjunction-null) conjunction analysis (i.e., a logical AND). Unless otherwise stated, we only report activations that are significant (P < 0.05) corrected for the search volume.

Effective Connectivity Analysis: DCM

DCM treats the brain as a dynamic input-state-output system. The inputs correspond to conventional stimulus functions encoding experimental manipulations. The state variables are neuronal activities, and the outputs are the regional hemodynamic responses measured with fMRI. The idea is to model changes in the states, which cannot be observed directly, using the known inputs and outputs. Critically, changes in the states of one region depend on the states (i.e., activity) of others. This dependency is parameterized by effective connectivity. There are 3 types of parameters in a DCM: 1) input parameters which describe how much brain regions respond to experimental stimuli, 2) intrinsic parameters that characterize effective connectivity among regions, and 3) modulatory parameters that characterize changes in effective connectivity caused by experimental manipulation. This third set of parameters, the modulatory effects, allows us to explain fMRI incongruency effects by changes in coupling among brain areas. Importantly, this coupling (effective connectivity) is expressed at the level of neuronal states. DCM employs a forward model, relating neuronal activity to fMRI data that can be inverted during the model fitting process. Put simply, the forward model is used to predict outputs using the inputs. The parameters are adjusted (using gradient descent) so that the predicted and observed outputs match. This adjustment corresponds to the model fitting.

For each subject, 2 DCMs (Friston et al. 2003Go) were constructed that entailed our 2 alternative hypotheses. In the first "bottom-up model," the incongruency effects emerge in a material-dependent fashion (i.e., selective for sounds or spoken words) via changes in forward connections from early auditory to intermediate multisensory areas. In the second "top-down model," they emerge irrespective of stimulus material through interactions among higher cognitive control regions and propagate down the cortical hierarchy to lower areas. Here, higher cognitive control regions such as the AC/mPFC and the lateral prefrontal cortex (IFS) may act as a general conflict monitoring and cognitive control device (Duncan and Owen 2000Go; Botvinick et al. 2001Go; Paus 2001Go; Kerns et al. 2004Go; Brown and Braver 2005Go) that modulates activation in intermediate multisensory convergence areas.

Each DCM (Fig. 5) included 6 regions that formed a 3-level cortical hierarchy: 1) a left superior temporal area that was activated by cross-modal stimuli relative to fixation (superior temporal gyrus [STG]; x = –63, y = –24, z = 9), 2) a left fusiform region that was activated by cross-modal stimuli relative to fixation (fusiform gyrus [FG]; x = –45, y = –60, z = –21), 3) a region in the left superior temporal sulcus (STS) exhibiting an incongruency effect that was selective for spoken words (STS/middle temporal gyrus [MTG]; x = –66, y = –27, z = –3), 4) a region in the left angular gyrus (AG)/IPS exhibiting an incongruency effect selective for sounds (AG/IPS; x = –30, y = –75, z = 42), 5) the AC/mPFC (x = 0, y = 18, z = 48), and 6) left inferior frontal sulcus (IFS; x = –42, y = 12, z = 24) showing non-selective incongruency effects. The effects of stimuli entered as extrinsic input to STG and FG separately for picture–sound, picture–word, word–sound, word–word stimuli to account for material-selective activation differences. Holding the number of parameters, the intrinsic and extrinsic connectivity structure constant, the 2 DCMs differed in where congruency effects were exerted: In the bottom-up DCM, the incongruency factor increased the forward connections from STG and FG to AG/IPS and STS/MTG in a material-dependent fashion. In the top-down DCM, they increased the connections between AC and IFS in a material-independent manner. Thus, these models encode either a greater sensitivity of AG/IPS and STS/MTG to incongruent bottom-up inputs or incongruent top-down inputs. Comparing these models allowed us to distinguish between a bottom-up and top-down mediation of incongruency effects.

The regions were selected using the maxima of the relevant contrasts from our random effects analysis. Region-specific time series (concatenated over the 2 sessions and adjusted for confounds) comprised the first eigenvariate of all voxels within a 4-mm radius centered on each peak identified in the random effects analysis.

For each model, the subject-specific modulatory effects were entered into t-tests at the group level (see Fig. 4). This allowed us to summarize the consistent findings from the subject-specific DCMs using classical statistics.

Bayes factors (=the ratio of the model evidences, Kass and Raftery 1995Go) were used for model comparison, that is, to decide whether the bottom-up or top-down DCM was the better model (Penny et al. 2004Go). In brief, given the measured data y and 2 competing models, Bayes factors are the ratio of the evidences of the 2 models. A Bayes factor of one presents equal evidence for the 2 models. A Bayes factor above 3 is considered positive evidence for one of the 2 models. The model evidence does depend not only on model fit but also on model complexity. Here, we have limited ourselves to the bottom-up and top-down models that were equated for the number of parameters, that is model complexity, and did not design a third more complex model endowed with bottom-up and top-down effects.

Finally, a group analysis was implemented by taking the product of the subject-specific Bayes factors over subjects (this is equivalent to the exponentiated sum of the log model evidences of each subject-specific DCM). However, we also report the Bayes factors for each individual subject (see Fig. 5, right column) to provide an intuition of consistency over subjects. As the Bayes factors for some subjects were very large, we have selected a cutoff of 8 to focus on the consistency across subjects in Figure 5.


    Results
 Top
 Abstract
 Introduction
 Mateirals and Methods
 Results
 Discussion
 Funding
 Appendix
 References
 
In the following, we report 1) the behavioral results, 2) the fMRI results of the conventional analysis focussing on regionally selective activations, and 3) the DCM results providing insight into potential neural mechanisms that mediate the observed regional activations.

Behavioral Results

For performance accuracy, a 3-way ANOVA with congruency (congruent vs. incongruentI vs. incongruentI+R), prime material (picture vs. written word), and target material (sound vs. spoken word) identified a significant main effect of congruency (F1.4,21.7 = 10.5, P < 0.01) and of target material (F1,16 = 32, P < 0.001) after Greenhouse–Geisser correction. In addition, there was a significant interaction effect between congruency and target material (F2,31 = 9.2, P = 0.001). For reaction times (RTs) (limited to correct trials only), a 3-way ANOVA identified main effects of congruency (F1.8, 28.6 = 129.1, P < 0.001) and target material (F1,16 = 11.8, P < 0.01) following Greenhouse–Geisser correction. RTs were shorter for spoken words than sounds. The absence of any significant interactions of congruency with prime (written words vs. pictures) or target material (spoken words vs. sounds) suggests that the prime duration (100 ms) allowed pictures and written words to elicit comparable priming effects irrespective of the target material (sounds or spoken words).

Post hoc comparisons (Bonferroni corrected) for accuracy and RTs revealed a significant incongruency effect of identity but not of response. Overall, these behavioral results suggest that incongruency may affect processes of stimulus recognition and categorization.

Conventional SPM Analysis

The conventional SPM analysis was performed in 2 steps: First, we identified regions that showed increased activation for incongruent > congruent stimuli (within the system of regions activated relative to fixation, see Materials and Methods). Second, within this system, pooling over prime, we tested for incongruency effects that were 1) common to sounds and spoken words, 2) selective for spoken words, or 3) selective for sounds (i.e., the interaction between congruent vs. incongruent and sounds vs. spoken words). For completeness, pooling over target, we tested for incongruency effects that were selective for pictures or written words (i.e., the interaction between congruent vs. incongruent and pictures vs. written words). In other words, we used the factorial character of our experimental design and pooled over one factor to increase the power when investigating the effect of the other factor.

Main Effect of Identity and Response Incongruency

Incongruent stimuli increased activations relative to congruent stimuli, in the AC/mPFC, bilateral IFS, left insula, IPS/the AG and MTG/STS, and the right cerebellum. None of the regions showed an effect of response incongruency (P > 0.05 uncorrected at peak coordinates). In other words, the activation in those areas did not depend on whether prime and target object required the same response but was primarily driven by whether visual prime and auditory target referred to the same object. This suggests that the activation increases might at least in part be due to incongruencies at the level of object processing and categorization rather than only response selection and preparation.

No increased activation was observed for congruent relative to incongruent trials within the system of regions activated relative to fixation (see Materials and Methods) (Table 2).


View this table:
[in this window]
[in a new window]

 
Table 2 Visuoauditory congruency effects averaged over sounds and words

 
Modulatory Effect of Target Material: Incongruency Effects Selective for Spoken Words, Sounds, or Both

Within the incongruency system identified above, the medial prefrontal region and the left IFS exhibited incongruency effects common for sounds and spoken words. Critically, pooling over primes, we observed a significant interaction between incongruency and target material: the left MTG/STS showed an enhanced incongruency effect for spoken words relative to sounds. In contrast, the left AG (extending into IPS) showed an increased incongruency effect for sounds relative to spoken words. Following the rationale of this experiment, the incongruency effects in mPFC/IFS may relate to higher conceptual/decisional processes, in AG/IPS to semantic processes, and in STS/MTG to phonological processes. In addition, we observed an incongruency effect selective for sounds in a more dorsal medial prefrontal region. Although only correct trials were included in our fMRI analysis, we note sound trials were still associated with greater error probability. Hence, the increased mPFC activation for incongruent sound trials may be related to their inherent ambiguity (cf., recent studies associating mPFC/AC with error probability prediction rather than error detection per se, Brown and Braver 2005Go) (Tables 3 and 4, Figs 2 and 3).


Figure 2
View larger version (44K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Row 1: Increased activations for incongruent relative to congruent stimuli separately for sounds (left) and spoken words (right) are rendered on a template of the whole brain. Height threshold: P < 0.05 corrected for multiple comparisons within the search space. Row 2: Congruency effects are rendered on a template of the whole brain: red = common for sounds and spoken words, green = sounds > spoken words, blue = spoken words > sounds. Height threshold: P < 0.001 uncorrected. Extent threshold > 10 voxels (for illustration purposes).

 


Figure 3
View larger version (42K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. Left: Increased activations for incongruent relative to congruent visuoauditory stimuli on axial and coronal slices of a mean EPI image created by averaging the subjects' normalized EPI images. Height threshold: P < 0.001 uncorrected for illustration purposes. Extent threshold: >1 voxel. Common for sounds and spoken words (Row 1 + 2). Interactions: Sounds > Spoken words (Row 3). Spoken words > Sounds (Row 4). Right: Parameter estimates for Congruent (c, grey) and Incongruent (i, black) visuoauditory trials relative to fixation. Prime: Pictures or Written Words. Targets: Sounds (S) or Spoken Words (W). The bar graphs represent the size of the effect in nondimensional units (corresponding to percent whole-brain mean). These effects are activations pooled (i.e., summed) over appropriate conditions. Row 1: x = –42, y = 12, z = 24. Row 2: x = 0, y = 18, z = 48. Row 3: x = 30, y = –75, z = 42. Row 4: x = –66, y = –27, z = –3.

 


View this table:
[in this window]
[in a new window]

 
Table 3 Visuoauditory congruency effects for sounds and words

 


View this table:
[in this window]
[in a new window]

 
Table 4 Interactions: target-dependent visuoauditory congruency effects

 
Modulatory Effect of Prime Material (Written Words vs. Pictures)

For completeness, pooling over target material, we tested for incongruency effects that were modulated by prime material (i.e., the interaction between incongruent vs. congruent and pictures vs. written words). However, no regions exhibited a significant interaction effect between congruency and prime material. The absence of a significant modulatory effect of prime material may be related to several factors. 1) The prime was presented very briefly (100 ms). 2) It was task and response irrelevant. 3) At the time of target presentation (i.e., 200 ms post-prime presentation), both phonological and semantic information may be available irrespective of target material (cf., Rahman et al. 2003Go; Schiller et al. 2003Go; Moscoso del Prado et al. 2006).

Effect of Performance on fMRI Incongruency Effects

To further characterize the common incongruency effects in the mPFC/AC and left IFS, we investigated their relationship to subject's performance measures. For this, we performed a second-level multiregression analysis, where we used subject-specific behavioral interference effects (i.e., RT and accuracy differences for incongruent > congruent) as predictors for the fMRI incongruency effects, expressed physiologically, in the mPFC/AC and left IFS (i.e., increased activation for incongruent > congruent stimuli). As RT and accuracy differences for incongruent relative to congruent trials were strongly negatively correlated over subjects (correlation coefficient = –0.7), we orthogonalized the accuracy with respect to the RT regressors. Given our a priori interests in the role of AC/mPFC and left IFS in incongruency effects, the results of this analysis are reported corrected for multiple comparisons within spheres (10 mm radius) centered on the peaks identified in the previous conjunction analysis (this does not bias our inference because the effects of RT and accuracy are orthogonal to the incongruency effects).

RT interference positively predicted fMRI incongruency effects in the mPFC/AC (x = 0, y = 24, z = 48; z-score = 3.51; P(svc) = 0.04) and in the lateral prefrontal cortex (x = –45, y = 6, z = 24; z-score = 3.5; P(svc) = 0.04). In addition, incongruency effects on accuracy negatively predicted fMRI incongruency effects in the lateral prefrontal cortex (x = –51, y = 9, z = 24; z-score = 3.9; P(svc) = 0.01). In other words, strong fMRI incongruency effects are associated with relatively longer RTs and lower accuracy for incongruent relative to congruent trials. Thus, consistent with current theories that implicate the mPFC/AC, IFS circuitry in conflict monitoring and cognitive control processes, mPFC/AC and IFS activation may be associated with stronger interference as indicated by longer processing times and less accurate performance on incongruent trials (Fig. 4).


Figure 4
View larger version (38K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4. Effects of subject's behavioral interference effects (RT and accuracy) on fMRI incongruency effects in AC/mPFC and left IFS. Left: Regional incongruency effects that were predicted by RT and accuracy (proportion correct) differences between incongruent and congruent trials across subjects on coronal and axial slices of a mean EPI image created by averaging the subjects' normalized EPI images. Height threshold: P < 0.001 uncorrected for illustration purposes (see Materials and Methods for further details). Extent threshold: >40 voxels. Right: Scatter plots depict the regression of the regional (adjusted) fMRI signal to RT (ms) and accuracy (proportion correct) interference (see Results for further details). The ordinate represents (adjusted) fMRI signal, the abscissa represents subject's mean interference effect (after mean correction), that is, RT and accuracy differences for incongruent versus congruent stimuli averaged over all types of compound stimuli.

 
Summary of the Results from the Conventional SPM Analysis

In summary, our results demonstrate that 1) the left MTG/STS shows an increased incongruency effect for spoken words relative to sounds, 2) the left AG/IPS exhibits an increased incongruency effect for sounds relative to spoken words, and 3) a medial prefrontal region and the left IFS are activated for incongruent relative to congruent stimuli for sounds as well as spoken words. Furthermore, the incongruency effects in mPFC/AC and left IFS were predicted by the behavioral interference effects across subjects.

DCM Analysis

At the group level, strong evidence was provided for the bottom-up relative to the top-down model suggesting that the incongruency effects may emerge in a material-dependent fashion (i.e., selective for spoken words or sounds) via modulation of forward connections from early auditory regions to STS/MTG and AG/IPS. In other words, the STS/MTG and AG/IPS showed a greater response to bottom-up inputs when they were incongruent. Figure 5 (right column) shows the Bayes factors (relative likelihood of the bottom-up model, relative to the top-down model) for each subject, to provide an intuition of consistency over subjects. A cutoff of 8 was used to focus on the fact that—despite intersubject variability in the magnitude of the Bayes factors—the bottom-up model provided a better explanation of the data than the top-down model in all subjects (apart from one showing equal model evidences). As the 2 DCMs were equated for the number of modulatory effects and intrinsic as well as extrinsic connectivity structure, the difference in model evidence is only due to model fit but not model complexity.


Figure 5
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5. Dynamic causal models—Left: bottom-up DCM, incongruency effects are mediated via forward connections selective for verbal (spoken words) and nonverbal (sounds) material. Middle: top-down DCM, incongruency effects are mediated by interactions between mPFC/AC and IFS. Values are the across subject mean (standard deviation) of changes in connection strength (at P < 0.05 in bold). These parameters quantify how experimental manipulations change the values of intrinsic connections. In dynamic systems, the strength of a coupling can be thought of as a rate constant or the reciprocal of the time constant. Typically, regional activity has a time constant on the order of 1–2 s (rate of 1–0.5 s–1). Therefore, a modulatory effect of 0.05 s–1corresponds to a 5–10% increase in coupling. AC = anterior cingulate/medial prefrontal cortex; IFS = inferior frontal sulcus, STS = superior temporal sulcus, AG = angular gyrus; STG = superior temporal gyrus; FG = fusiform gyrus. Black: intrinsic connections; Purple: extrinsic input; Green: modulatory effects. I Stimuli = all incongruent stimuli, I Words = incongruent spoken words, I Sounds = incongruent sounds. Right: Bar chart of Bayes factors for bottom-up relative to top-down model for each subject. A cutoff of 8 was selected to focus on the fact that despite intersubject variability, all subjects (apart from one) consistently showed Bayes factors in favor of the bottom-up model.

 
The numbers by the connections are the change in coupling (i.e., responsiveness of the target region) induced by incongruency or material (sounds vs. spoken words) effects averaged across subjects. Note that in both models, 3 modulatory effects are significant across subjects (Fig. 5).


    Discussion
 Top
 Abstract
 Introduction
 Mateirals and Methods
 Results
 Discussion
 Funding
 Appendix
 References
 
This visuoauditory priming study demonstrates the effect of prior visual information on recognition and categorization of environmental sounds and spoken words at the neural and behavioral level. Subjects spent more time and were less accurate for incongruent relative to congruent trials. Consistent with the behavioral interference effect, incongruent relative to congruent visuoauditory trials increased activation in a large distributed neural system encompassing the AC/mPFC, IFS, AG/IPS, and MTG/STS. These effects were observed for incongruent trials irrespective of additional response incongruency. Critically, while the behavioral interference—as measured by longer RTs—was equivalent for sounds and spoken words, our functional imaging results revealed that they were mediated by distinct neural systems. Combining visuoauditory priming for spoken words and environmental sounds enabled us to test for the interaction between congruency and target material (i.e., sounds vs. spoken words) and segregate the incongruency effects into 3 classes: visuoauditory incongruency effects were enhanced for 1) spoken words in the left anterior MTG/STS, 2) sounds in the left AG/IPS, and 3) both words and sounds, in the mPFC/AC and left IFS.

From a cognitive perspective, these distinct classes suggest that prior visual information modulates categorization of complex auditory stimuli at multiple stages. Based on our initial rationale that processing auditory-visual stimuli relies more on phonology for spoken words and semantics for environmental sounds, these regionally selective responses may implicate 1) the MTG/STS in phonological, 2) the AG/IPS in semantic and associated recognition processes, and 3) mPFCm/IFS in higher conceptual or "conflict monitoring" processes. In terms of neural mechanisms, our DCM results suggest that these incongruency effects may emerge in a material-dependent fashion, that is, selective for spoken words and environmental sounds via a greater influence of forwards connections from early auditory regions to MTG/STS and AG/IPS.

The selective response enhancement in STS/MTG for incongruent spoken words is consistent with its established role in auditory speech processing (Mummery et al. 1999Go; Binder et al. 2000Go; Scott et al. 2000Go; Giraud and Price 2001Go; Price et al. 2003Go; Scott and Johnsrude 2003Go). Furthermore, activation in multiple STS regions has been shown for visuoauditory integration and congruency of 1) seen mouth movements and heard speech during speech reading (Calvert et al. 2000Go; Macaluso et al. 2004Go) as well as 2) spoken and written letters (van Atteveldt et al. 2004Go). Interestingly, when presenting written and spoken phonemes synchronously during passive listening and viewing, only congruent auditory-visual speech that allows successful binding is associated with increased STS activation relative to unimodal speech (Calvert et al. 2000Go; van Atteveldt et al. 2004Go). In contrast, in our visuoauditory priming paradigm, the task-irrelevant but incongruent visual prime induces behavioral interference and increases STS activation for categorization of the subsequent spoken word. This demonstrates that different multisensory paradigms can point to the same anatomical locus, but nevertheless be very distinct in their multisensory interaction. During a passive viewing–listening task, both stimulus components are task relevant enabling integration into a coherent percept. In contrast, visuoauditory priming can be considered a selective attention task, where task-irrelevant incongruent visual information needs to be suppressed or overcome by amplification of the task-relevant auditory information. Collectively, both types of visuoauditory interaction effects suggest that the anterior MTG/STS region may be the locus of neuronal processes that underpin visuoauditory interactions or incongruency effects that are conveyed phonologically.

In the AG/IPS, the incongruency effect was selective for categorization of environmental sounds. As recognition and categorization of environmental sounds is accomplished through interactions between perceptual and semantic processing, this speaks to a role in congruency primarily at the level of semantic representations and converges with recent functional imaging results implicating the IPS in semantic rather than response conflict (van Veen and Carter 2005Go). Furthermore, the AG/IPS is part of a frontotemporoparietal semantic retrieval system that is generally activated for semantic relative to perceptual or phonological tasks (Vandenberghe et al. 1996Go; Noppeney and Price 2003Go, 2004Go; Binder et al. 2005Go; Sabsevitz et al. 2005Go). However, its role in semantic processing has been elusive. Thus, only large but not focal parietal lesions are associated with semantic retrieval deficits as measured by standard neuropsychological tests (e.g., Pyramids and Palm Tree test, Alexander et al. 1989Go). One interesting possibility that arises from our findings is that the AG/IPS is involved in controlling, accessing, and combining semantic information from multiple senses. This hypothesis needs to be investigated further by 1) comparing visuoauditory priming to unimodal (the limited number of unimodal trials in our experiment did not allow that comparison) and audiovisual (i.e., auditory prime and visual target) priming using fMRI and 2) testing patients with left-lateralized parietal lesions on nonverbal cross-modal (e.g., sound–picture) matching or priming tasks.

In contrast to STS/MTG and AG/IPS, activation in mPFC/AC and left IFS was increased for both, incongruent sounds and spoken words. The behavioral relevance of these effects was highlighted by their significant correlations with subjects' increases in RT and decreases in accuracy for incongruent relative to congruent trials. In other words, subjects spending relatively more time on and showing more accuracy reductions for incongruent trials, exhibit strong mPFC/AC and IFS incongruency effects. These results extend the role of the mPFC/AC—IFS circuitry in cognitive control processes such as conflict monitoring or predicting error probability to the multisensory domain (Duncan and Owen 2000Go; Botvinick et al. 2001Go; Paus 2001Go; Noppeney and Price 2002Go; Laurienti et al. 2003Go; Kerns et al. 2004Go; Brown and Braver 2005Go). Thus, the AC/mPFC and IFS may be engaged in evaluating and integrating higher level conceptual information abstracted from different stimulus materials (i.e., verbal vs. nonverbal) and modalities (auditory vs. visual).

The multiple incongruency effects raise the question at which level of the cortical hierarchy they emerge (Mesulam 1990Go; McIntosh 2000Go; Horwitz 2003Go). More specifically, are the incongruency effects mediated via sensitization to forward connections from early auditory regions to STS/MTG and AG/IPS, or do they emerge through greater top-down influences from the AC–IFS circuitry indicating increased cognitive control? Bayesian model comparison provided strong evidence for the bottom-up model where the influence of early auditory regions on STS and AG/IPS is increased during incongruent trials. Critically, the forward connectivity is selectively modulated by the different classes of incongruency: phonological incongruency increases the forward connections to STS/MTG, semantic incongruency to AG/IPS. The proposed bottom-up mechanism converges with recent electroencephalography results demonstrating auditory-visual incongruency and category-specific effects for tools and animals as early as 100 ms poststimulus (Molholm et al. 2004Go; Hauk et al. 2006Go; Murray et al. 2006Go). Hence, the human brain may be able to distinguish rapidly between tools and animals and detect higher level incongruencies between auditory and visual stimulus components. Collectively, these results are consistent with predictive coding hypotheses, in which prediction errors at all levels of a cortical hierarchy guide perceptual inference (Rao and Ballard 1999Go). In our case, the failure to suppress prediction error, in the context of unpredictable or incongruent bottom-up cross-modal inputs, is manifest as an increase in forward connectivity. Furthermore, the nature of the prediction error (phonological vs. semantic) determines where it is expressed (MTG/STS vs. AG/IPS).

Our findings suggest that prior visual information influences the neural processes underlying speech and sound recognition at multiple levels with the left anterior MTG/STS being involved in phonological, the AG/IPS in semantic, and the mPFC/IFS in higher conceptual processes. In terms of neural mechanisms, effective connectivity analyses indicate that the incongruency effects emerge via a failure to suppress incongruent, bottom-up inputs from early auditory regions to MTG/STS or AG/IPS. This is consistent with a predictive coding perspective on hierarchical Bayesian inference in the cortex where the domain of the prediction error (phonological vs. semantic) determines its regional expression (MTG/STS vs. AG/IPS).


    Funding
 Top
 Abstract
 Introduction
 Mateirals and Methods
 Results
 Discussion
 Funding
 Appendix
 References
 
The Deutsche Forschungsgemeinschaft; the Max-Planck Society; and the Wellcome Trust.


    Appendix
 Top
 Abstract
 Introduction
 Mateirals and Methods
 Results
 Discussion
 Funding
 Appendix
 References
 
Example trials: Each stimulus (e.g., bear) was presented 4 times as a target in congruent pairs and 4 times as a target in incongruent pairs. Across subjects, each stimulus was counterbalanced across stimulus modalities and conditions.Formula


    Acknowledgments
 
Conflict of Interest: None declared.


    References
 Top
 Abstract
 Introduction
 Mateirals and Methods
 Results
 Discussion
 Funding
 Appendix
 References
 
Alexander MP, Hiltbrunner B, Fischer RS. Distributed anatomy of transcortical aphasia. Arch Neurol (1989) 46:885–892.[Abstract/Free Full Text]

Amedi A, von Kriegstein K, van Atteveldt NM, Beauchamp MS, Naumer MJ. Functional imaging of human crossmodal identification and object recognition. Exp Brain Res (2005) 166:559–571.[CrossRef][Web of Science][Medline]

Badgaiyan RD, Schacter DL, Alpert NM. Auditory priming within and across modalities: evidence from positron emission tomography. J Cogn Neurosci (1999) 11:337–348.[CrossRef][Web of Science][Medline]

Barraclough NE, Xiao D, Baker CI, Oram MW, Perrett DI. Integration of visual and auditory information by superior temporal sulcus neurons responsive to the sight of actions. J Cogn Neurosci (2005) 17:377–391.[CrossRef][Web of Science][Medline]

Beauchamp MS, Argall BD, Bodurka J, Duyn JH, Martin A. Unraveling multisensory integration: patchy organization within human STS multisensory cortex. Nat Neurosci (2004) 7:1190–1192.[CrossRef][Web of Science][Medline]

Beauchamp MS, Lee KE, Argall BD, Martin A. Integration of auditory and visual information about objects in superior temporal sulcus. Neuron (2004) 41:809–823.[CrossRef][Web of Science][Medline]

Binder JR, Frost JA, Hammeke TA, Bellgowan PS, Springer JA, Kaufman JN, Possing ET. Human temporal lobe activation by speech and nonspeech sounds. Cereb Cortex (2000) 10:512–528.[Abstract/Free Full Text]

Binder JR, Liebenthal E, Possing ET, Medler DA, Ward BD. Neural correlates of sensory and decision processes in auditory object identification. Nat Neurosci (2004) 7:295–301.[CrossRef][Web of Science][Medline]

Binder JR, Westbury CF, McKiernan KA, Possing ET, Medler DA. Distinct brain systems for processing concrete and abstract concepts. J Cogn Neurosci (2005) 17:905–917.[CrossRef][Web of Science][Medline]

Botvinick MM, Braver TS, Barch DM, Carter CS, Cohen JD. Conflict monitoring and cognitive control. Psychol Rev (2001) 108:624–652.[CrossRef][Web of Science][Medline]

Brown JW, Braver TS. Learned predictions of error likelihood in the anterior cingulate cortex. Science (2005) 307:1118–1121.[Abstract/Free Full Text]

Callan DE, Jones JA, Munhall K, Kroos C, Callan AM, Vatikiotis-Bateson E. Multisensory integration sites identified by perception of spatial wavelet filtered visual speech gesture information. J Cogn Neurosci (2004) 16:805–816.[CrossRef][Web of Science][Medline]

Calvert GA. Crossmodal processing in the human brain: insights from functional neuroimaging studies. Cereb Cortex (2001) 11:1110–1123.[Abstract/Free Full Text]

Calvert GA, Brammer MJ, Bullmore ET, Campbell R, Iversen SD, David AS. Response amplification in sensory-specific cortices during crossmodal binding. Neuroreport (1999) 10:2619–2623.[Web of Science][Medline]

Calvert GA, Campbell R, Brammer MJ. Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Curr Biol (2000) 10:649–657.[CrossRef][Web of Science][Medline]

Calvert GA, Hansen PC, Iversen SD, Brammer MJ. Detection of audio-visual integration sites in humans by application of electrophysiological criteria to the BOLD effect. Neuroimage (2001) 14:427–438.[CrossRef][Web of Science][Medline]

Calvert GA, Lewis JW. Hemodynamic studies of audio-visual interactions. In: The handbook of multi-sensory processes—Calvert GA, Spence C, Stein BE, eds. (2004) Cambridge (MA): MIT press. 483–502.

Chao LL, Haxby JV, Martin A. Attribute-based neural substrates in temporal cortex for perceiving and knowing about objects. Nat Neurosci (1999) 2:913–919.[CrossRef][Web of Science][Medline]

Duncan J, Owen AM. Common regions of the human frontal lobe recruited by diverse cognitive demands. Trends Neurosci (2000) 23:475–483.[CrossRef][Web of Science][Medline]

Evans AC, Collins DL, Milner B. An MRI-based stereotactic atlas from 250 young normal subjects. Soc Nuerosci Abstr. (1992).

Foxe JJ, Morocz IA, Murray MM, Higgins BA, Javitt DC, Schroeder CE. Multisensory auditory-somatosensory interactions in early cortical processing revealed by high-density electrical mapping. Brain Res Cogn Brain Res (2000) 10:77–83.[CrossRef][Medline]

Friston KJ, Harrison L, Penny W. Dynamic causal modelling. Neuroimage (2003) 19:1273–1302.[CrossRef][Web of Science][Medline]

Friston KJ, Holmes A, Worsley KJ, Poline JB, Frith CD, Frackowiak R. Statistical parametric mapping: a general linear approach. Hum Brain Mapp (1995) 2:189–210.[Medline]

Friston KJ, Holmes AP, Price CJ, Buchel C, Worsley KJ. Multisubject fMRI studies and conjunction analyses. Neuroimage (1999) 10:385–396.[CrossRef][Web of Science][Medline]

Friston KJ, Penny WD, Glaser DE. Conjunction revisited. Neuroimage (2005) 25:661–667.[CrossRef][Web of Science][Medline]

Friston KJ, Price CJ. Generative models, brain function and neuroimaging. Scand J Psychol (2001) 42:167–177.[CrossRef][Web of Science][Medline]

Fu KM, Johnston TA, Shah AS, Arnold L, Smiley J, Hackett TA, Garraghty PE, Schroeder CE. Auditory cortical neurons respond to somatosensory stimulation. J Neurosci (2003) 23:7510–7515.[Abstract/Free Full Text]

Fuster JM, Bodner M, Kroger JK. Cross-modal and cross-temporal association in neurons of frontal cortex. Nature (2000) 405:347–351.[CrossRef][Medline]

Ghazanfar AA, Maier JX, Hoffman KL, Logothetis NK. Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. J Neurosci (2005) 25:5004–5012.[Abstract/Free Full Text]

Ghazanfar AA, Schroeder CE. Is neocortex essentially multisensory? Trends Cogn Sci (2006) 10:278–285.[CrossRef][Web of Science][Medline]

Gibson JR, Maunsell JH. Sensory modality specificity of neural activity related to memory in visual cortex. J Neurophysiol (1997) 78:1263–1275.[Abstract/Free Full Text]

Giraud AL, Price CJ. The constraints functional neuroimaging places on classical models of auditory word processing. J Cogn Neurosci (2001) 13:754–765.[CrossRef][Web of Science][Medline]

Gonzalo D, Büchel C. Crossmodal associative learning modulates fusiform face's areas response to sound (2003) Geneva: 3rd Annual Meeting International Multi-Sensory Research Forum.

Gottfried JA, Dolan RJ. The nose smells what the eye sees: crossmodal visual facilitation of human olfactory perception. Neuron (2003) 39:375–386.[CrossRef][Web of Science][Medline]

Gottfried JA, Smith AP, Rugg MD, Dolan RJ. Remembrance of odors past: human olfactory cortex in cross-modal recognition memory. Neuron (2004) 42:687–695.[CrossRef][Web of Science][Medline]

Grill-Spector K, Henson R, Martin A. Repetition and the brain: neural models of stimulus-specific effects. Trends Cogn Sci (2006) 10:14–23.[CrossRef][Web of Science][Medline]

Hauk O, Shtyrov Y, Pulvermuller F. The sound of actions as reflected by mismatch negativity: rapid activation of cortical sensory-motor networks by sounds associated with finger and tongue movements. Eur J Neurosci (2006) 23:811–821.[CrossRef][Web of Science][Medline]

Henson RN. Neuroimaging studies of priming. Prog Neurobiol (2003) 70:53–81.[CrossRef][Web of Science][Medline]

Henson RN, Rugg MD. Neural response suppression, haemodynamic repetition effects, and behavioural priming. Neuropsychologia (2003) 41:263–270.[CrossRef][Web of Science][Medline]

Horwitz B. The elusive concept of brain connectivity. Neuroimage (2003) 19:466–470.[CrossRef][Web of Science][Medline]

Humphreys GW, Forde EM. Hierarchies, similarity, and interactivity in object recognition: "category-specific" neuropsychological deficits. Behav Brain Sci (2001) 24:453–476.[Web of Science][Medline]

Ikeda M, Patterson K, Graham KS, Ralph MA, Hodges JR. A horse of a different colour: do patients with semantic dementia recognise different versions of the same object as the same? Neuropsychologia (2006) 44:566–575.[CrossRef][Web of Science][Medline]

Josephs O, Deichmann R, Turner R. Trajectory measurement and generalized reconstruction in rectilinear EPI. ISMRM Meeting (2000) 151:7.

Kass RE, Raftery AE. Bayes factors. J Am Stat Assoc (1995) 90:773–795.[CrossRef][Web of Science]

Kayser C, Petkov CI, Augath M, Logothetis NK. Integration of touch and sound in auditory cortex. Neuron (2005) 48:373–384.[CrossRef][Web of Science][Medline]

Kerns JG, Cohen JD, MacDonald AW 3rd, Cho RY, Stenger VA, Carter CS. Anterior cingulate conflict monitoring and adjustments in control. Science (2004) 303:1023–1026.[Abstract/Free Full Text]

Laurienti PJ, Kraft RA, Maldjian JA, Burdette JH, Wallace MT. Semantic congruence is a critical factor in multisensory behavioral performance. Exp Brain Res (2004) 158:405–414.[Web of Science][Medline]

Laurienti PJ, Wallace MT, Maldjian JA, Susi CM, Stein BE, Burdette JH. Cross-modal sensory processing in the anterior cingulate and medial prefrontal cortices. Hum Brain Mapp (2003) 19:213–223.[CrossRef][Web of Science][Medline]

Lehmann S, Murray MM. The role of multisensory memories in unisensory object discrimination. Brain Res Cogn Brain Res (2005) 24:326–334.[CrossRef][Medline]

Lewis JW, Brefczynski JA, Phinney RE, Janik JJ, DeYoe EA. Distinct cortical pathways for processing tool versus animal sounds. J Neurosci (2005) 25:5148–5158.[Abstract/Free Full Text]

Lewis JW, Wightman FL, Brefczynski JA, Phinney RE, Binder JR, DeYoe EA. Human brain regions involved in recognizing environmental sounds. Cereb Cortex (2004) 14:1008–1021.[Abstract/Free Full Text]

Macaluso E, George N, Dolan R, Spence C, Driver J. Spatial and temporal factors during processing of audiovisual speech: a PET study. Neuroimage (2004) 21:725–732.[CrossRef][Web of Science][Medline]

McIntosh AR. Towards a network theory of cognition. Neural Netw (2000) 13:861–870.[CrossRef][Web of Science][Medline]

Mesulam MM. Large-scale neurocognitive networks and distributed processing for attention, language, and memory. Ann Neurol (1990) 28:597–613.[CrossRef][Web of Science][Medline]

Molholm S, Ritter W, Javitt DC, Foxe JJ. Multisensory visual-auditory object recognition in humans: a high-density electrical mapping study. Cereb Cortex (2004) 14:452–465.[Abstract/Free Full Text]

Molholm S, Ritter W, Murray MM, Javitt DC, Schroeder CE, Foxe JJ. Multisensory auditory-visual interactions during early sensory processing in humans: a high-density electrical mapping study. Brain Res Cogn Brain Res (2002) 14:115–128.[CrossRef][Medline]

Moscoso del Prado MF, Hauk O, Pulvermuller F. Category specificity in the processing of color-related and form-related words: an ERP study. Neuroimage (2006) 29:29–37.[Web of Science][Medline]

Mummery CJ, Ashburner J, Scott SK, Wise RJ. Functional neuroimaging of speech perception in six normal and two aphasic subjects. J Acoust Soc Am (1999) 106:449–457.[CrossRef][Web of Science][Medline]

Murray MM, Camen C, Gonzalez Andino SL, Bovet P, Clarke S. Rapid brain discrimination of sounds of objects. J Neurosci (2006) 26:1293–1302.[Abstract/Free Full Text]

Murray MM, Foxe JJ, Wylie GR. The brain uses single-trial ultisensory memories to discriminate without awareness. Neuroimage (2005) 27:473–478.[CrossRef][Web of Science][Medline]

Neely JH. Semantic priming and retrieval from lexical memory: roles of inhibitionless spreading activation and limited-capacity attention. J Exp Psycho Gen (1977) 106:226–254.[CrossRef]

Noppeney U, Price C. Retrieval of abstract semantics. Neuroimage (2004) 22:164–170.[CrossRef][Web of Science][Medline]

Noppeney U, Price CJ. A PET study of stimulus- and task-induced semantic processing. Neuroimage (2002) 15:927–935.[CrossRef][Web of Science][Medline]

Noppeney U, Price CJ. Functional imaging of the semantic system: retrieval of sensory-experienced and verbally-learnt knowledge. Brain Lang (2003) 84:120–133.[CrossRef][Web of Science][Medline]

Noppeney U, Price CJ, Penny WD, Friston KJ. Two distinct neural mechanisms for category-selective responses. Cereb Cortex (2006) 16:437–445.[Abstract/Free Full Text]

Nyberg L, Habib R, McIntosh AR, Tulving E. Reactivation of encoding-related brain activity during memory retrieval. Proc Natl Acad Sci USA (2000) 97:11120–11124.[Abstract/Free Full Text]

Olson IR, Gatenby JC, Gore JC. A comparison of bound and unbound audio-visual information processing in the human cerebral cortex. Brain Res Cogn Brain Res (2002) 14:129–138.[CrossRef][Medline]

Paus T. Primate anterior cingulate cortex: where motor control, drive and cognition interface. Nat Rev Neurosci (2001) 2:417–424.[CrossRef][Web of Science][Medline]

Penny WD, Stephan KE, Mechelli A, Friston KJ. Comparing dynamic causal models. Neuroimage (2004) 22:1157–1172.[CrossRef][Web of Science][Medline]

Plaut DC, McClelland JL, Seidenberg MS, Patterson K. Understanding normal and impaired word reading: computational principles in quasi-regular domains. Psychol Rev (1996) 103:56–115.[CrossRef][Web of Science][Medline]

Potter MC, Faulconer BA. Time to understand pictures and words. Nature (1975) 253:437–438.[CrossRef][Medline]

Price CJ, Winterburn D, Giraud AL, Moore CJ, Noppeney U. Cortical localisation of the visual and auditory word form areas: a reconsideration of the evidence. Brain Lang (2003) 86:272–286.[CrossRef][Web of Science][Medline]

Rahman RA, van Turennout M, Levelt WJ. Phonological encoding is not contingent on semantic feature retrieval: an electrophysiological study on object naming. J Exp Psychol Learn Mem Cogn (2003) 29:850–860.[CrossRef][Web of Science][Medline]

Raij T, Uutela K, Hari R. Audiovisual integration of letters in the human brain. Neuron (2000) 28:617–625.[CrossRef][Web of Science][Medline]

Rao RP, Ballard DH. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci (1999) 2:79–87.[CrossRef][Web of Science][Medline]

Rogers TT, Lambon Ralph MA, Garrard P, Bozeat S, McClelland JL, Hodges JR, Patterson K. Structure and deterioration of semantic memory: a neuropsychological and computational investigation. Psychol Rev (2004) 111:205–235.[CrossRef][Web of Science][Medline]

Sabsevitz DS, Medler DA, Seidenberg M, Binder JR. Modulation of the semantic system by word imageability. Neuroimage (2005) 27:188–200.[CrossRef][Web of Science][Medline]

Saito DN, Yoshimura K, Kochiyama T, Okada T, Honda M, Sadato N. Cross-modal binding and activated attentional networks during audio-visual speech integration: a functional MRI study. Cereb Cortex (2005) 15:1750–1760.[Abstract/Free Full Text]

Schiller NO, Bles M, Jansma BM. Tracking the time course of phonological encoding in speech production: an event-related brain potential study. Brain Res Cogn Brain Res (2003) 17:819–831.[CrossRef][Medline]

Schroeder CE, Foxe J. Multisensory contributions to low-level, ‘unisensory’ processing. Curr Opin Neurobiol (2005) 15:454–458.[CrossRef][Web of Science][Medline]

Schroeder CE, Foxe JJ. The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. Brain Res Cogn Brain Res (2002) 14:187–198.[CrossRef][Medline]

Scott SK, Blank CC, Rosen S, Wise RJ. Identification of a pathway for intelligible speech in the left temporal lobe. Brain (2000) 123:2400–2406. Pt 12.[Abstract/Free Full Text]

Scott SK, Johnsrude IS. The neuroanatomical and functional organization of speech perception. Trends Neurosci (2003) 26:100–107.[CrossRef][Web of Science][Medline]

Stein BE, Meredith MA. Merging of the senses. (1993) Cambridge (MA): MIT Press.

Talairach J, Tournoux P. Co-planar stereotaxic atlas of the human brain. (1988) Stuttgart (Germany): Thieme.

Tanabe HC, Honda M, Sadato N. Functionally segregated neural substrates for arbitrary audiovisual paired-association learning. J Neurosci (2005) 25:6409–6418.[Abstract/Free Full Text]

Taylor KI, Moss HE, Stamatakis EA, Tyler LK. Binding crossmodal object features in perirhinal cortex. Proc Natl Acad Sci USA (2006) 103:8239–8244.[Abstract/Free Full Text]

van Atteveldt N, Formisano E, Goebel R, Blomert L. Integration of letters and speech sounds in the human brain. Neuron (2004) 43:271–282.[CrossRef][Web of Science][Medline]

van Veen V, Carter CS. Separating semantic conflict and response conflict in the Stroop task: a functional MRI study. Neuroimage (2005) 27:497–504.[CrossRef][Web of Science][Medline]

Vandenberghe R, Price C, Wise R, Josephs O, Frackowiak RS. Functional anatomy of a common semantic system for words and pictures [see comments]. Nature (1996) 383:254–256.[CrossRef][Medline]

Wright TM, Pelphrey KA, Allison T, McKeown MJ, McCarthy G. Polysensory interactions along lateral temporal regions evoked by audiovisual speech. Cereb Cortex (2003) 13:1034–1043.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Cereb CortexHome page
S. Werner and U. Noppeney
Superadditive Responses in Superior Temporal Sulcus Predict Audiovisual Benefits in Object Categorization
Cereb Cortex, November 18, 2009; (2009) bhp248v1.
[Abstract] [Full Text] [PDF]


Home page
J. Neurosci.Home page
S. Sadaghiani, J. X. Maier, and U. Noppeney
Natural, Metaphoric, and Linguistic Auditory Direction Signals Have Distinct Influences on Visual Motion Processing
J. Neurosci., May 20, 2009; 29(20): 6490 - 6499.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
18/3/598    most recent
bhm091v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Noppeney, U.
Right arrow Articles by Friston, K. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Noppeney, U.
Right arrow Articles by Friston, K. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?