Cross-modal processing of auditory and visual symbol representations in the temporo-parietal cortex

doi:10.21203/rs.3.rs-8699439/v1

Cross-modal processing of auditory and visual symbol representations in the temporo-parietal cortex

2026 · doi:10.21203/rs.3.rs-8699439/v1

preprint OA: closed

Full text JSON View at publisher

Full text 115,850 characters · extracted from preprint-html · click to expand

Cross-modal processing of auditory and visual symbol representations in the temporo-parietal cortex | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Cross-modal processing of auditory and visual symbol representations in the temporo-parietal cortex Zhiwei Chen, Jan Kurzawski, Logan Dowdle, Francesco Gentile, Dora Gozukara, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8699439/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 7 You are reading this latest preprint version Abstract Numeracy and literacy are fundamental cognitive skills that rely on associating visual symbols with their spoken representations. Prior research has identified the posterior temporal-parietal cortex as a key neural region for the cross-modal transformation of these audio-visual alphanumeric symbols. However, the modality-dependent and cross-modal cortical activation patterns underlying these transformations remain unclear. In this slow-event-related 3T fMRI experiment, twenty-one participants were presented with auditory or visual letters and numbers while performing a passive listening/viewing task. We found overlapping activation across auditory cortical regions for auditory letters/numbers and across ventral visual regions for visual letters/numbers. In particular, activity in superior temporal cortical regions such as A5/A4/Parabelt exhibited high reliability for auditory stimuli, whereas activity in occipital and ventral temporal cortical regions such as V3/V4/PH demonstrated high reliability for visual stimuli. The temporo-parieto-occipital junction (TPOJ) showed overlapping responses with similar amplitudes for both auditory and visual stimuli. Despite this global similarity in responses, multivariate analysis revealed that the right TPOJ successfully differentiated between visual and auditory stimuli. Our findings reinforce the TPOJ’s role in the cross-modal processing of symbolic representations and may have implications for developmental learning difficulties such as dyslexia, where cross-modal integration may form a challenge for acquiring reading fluency. Biological sciences/Neuroscience Biological sciences/Psychology Social science/Psychology audio-visual integration MVPA STG Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Numeracy and literacy are fundamental skills that individuals typically acquire at an early age [1] and have a profound impact on academic achievement [2, 3]. The identification of single letters and numbers forms a foundational step in numeracy and literacy [4, 5] and in both cases involves setting up associations between visual symbols and their corresponding spoken language representations. In most alphabetic languages, letters correspond to phonemes in their spoken form, although with cross-linguistic variation in the consistency of the correspondences [6]. The ability to form efficient associations between letters and speech sounds is crucial to the development of fluent reading skills [6, 7]. Likewise, the capacity for automatic association of auditory and visual representations of numbers plays an essential role in both everyday functioning and mathematical competence [8]. At the cortical level, auditory speech sound representations are processed in the superior temporal cortex （STC）[9-11]，while the visual recognition of letters engages the left ventral occipito-temporal cortex [12-14]. Although less studied, auditory number words activate regions such as the superior and middle temporal gyri [15, 16], while visual recognition of Arabic numerals primarily involves the ventral occipito-temporal cortex including the posterior inferior temporal gyrus (pITG) [15, 17, 18]. The neural mechanisms underlying the cross-modal integration of auditory and visual letter and number representations remain only partially understood. Previous studies investigating audiovisual (AV) integration have generally presented auditory and visual stimuli concurrently, comparing bimodal responses to unimodal ones (e.g., (AV > V) ∩ (AV > A)) or contrasting congruent versus incongruent stimuli. Across these studies, the superior temporal cortex/gyrus (STC/STG) is consistently observed to be involved in the audiovisual integration process [19-21]. In line with this, a substantial literature has established the STC as critical for integrating letter symbols and speech sounds [5, 22-24]. Accordingly, the STC seems to contribute to the neural representation of letters not only in spoken but also in written form. Consistent with the proposed role of the STC in integrating letter and speech sound information, developmental work demonstrates that early readers show increased STC activation during the processing of congruent as compared to incongruent or unlearned audiovisual letter–speech sound pairs, and that the magnitude of this enhancement predicts literacy skills [25, 26]. Children with reading problems due to developmental dyslexia instead exhibit a reduced modulatory effect of letter and sound congruency on cortical responses in the planum temporale, Heschl’s sulcus and superior temporal sulcus (STS) [22]. Prereaders at-risk for dyslexia exhibited audiovisual integration effects following a brief artificial letter–speech sound training, with outcomes modulated by individual learning rate [27]. Participants classified as fast learners demonstrated stronger congruency effects for trained artificial letter–speech sound pairs, particularly in the right superior temporal gyrus (STG) and the left inferior temporal cortex [27]. These findings highlight the STC as a critical hub for multisensory integration of symbolic information. However, it remains unclear whether the STC also plays a comparable role in integrating auditory and visual number representations. In numerical cognition, a large amount of research has found that the intraparietal sulcus (IPS) plays an important role in number quantity processing [16, 28-30]. Some evidence does suggest involvement of the left posterior STG in symbolic number processing [29]. Holloway and colleagues (2010) asked participants to compare the magnitude of both visual symbolic and nonsymbolic numerical stimuli in an fMRI study, and found that symbolic Arabic numbers elicited significantly greater activation in the left posterior STG compared to nonsymbolic stimuli. Although previous studies have consistently implicated the posterior STG as a critical region in visual–auditory integration, the reported anatomical labels and their stereotactic coordinates vary across studies. Based on their reported coordinates, most of these regions align with, or are located near, TPOJ1 (Temporo-Parieto-Occipital Junction 1) as defined in the Glasser atlas [31]. Table 1 lists the anatomical labels and Talairach coordinates of the superior temporal and temporo-parietal regions reported across studies (MNI coordinates were converted to Talairach space). It is apparent that the TPOJ lies at the intersection of auditory and visual processing, however it is likely that this area exhibits distinct modality-specific activation profiles in addition to activity modulations due to audio-visual (in)congruency. Table 1. Talairach Coordinates of Superior Temporal Integration Sites* Reported from Previous Studies Study Stimuli Region Talairach coordinates x y z Glasser atlas (Glasser et al., 2016) Left TPOJ1 [-46, -60] [-35,-59] [1, 11] Right TPOJ1 [43, 58] [-35, -54] [3, 18] Calvert et al., 2000 Audiovisual speech Left STS -53 -48 9 Raij et al., 2000 Letters and speech sounds Left STS -53 -31 0 Right STS 48 -31 6 Sekiyama et al., 2003 Audiovisual speech Left STS -56 -49 9 van Atteveldt et al., 2004 Letters and speech sounds Left STS -54 -48 9 Right STG 52 -33 18 Blau et al., 2010 Letters and speech sounds Left STS -56 -33 4 Right STS 58 -33 3 Karipidis et al., 2017 Artificial Letter and speech sounds Right STG/MTG 62 -47 12 Left STG -57 50 17 Romanovska et al., 2021 Letters and ambiguous speech sounds Left STG -58 -31 13 Left STG -53 -29 14 Left STG -59 -30 13 Beck et al., 2023 Letters and speech sounds Left STG -51 -37 16 Right STG 45 -37 13 * The definition of integration site varied across studies, e.g., peak, centre of mass The current study aims to explore how the brain processes cross-modal symbols, minimising potential task-related or attentional biases. We presented single letters and numbers in visual or auditory modalities and asked participants to view/listen to these stimuli without further task requirements. Using a univariate GLM, we observed consistent and overlapping responses to auditory letters and numbers in STC regions, and visual letters and numbers in the ventral visual cortex. These analyses further showed overlapping responses with similar amplitudes for both auditory and visual symbols within TPOJ1. The majority of voxels in TPOJ1 showed no significant difference in activity between the two modalities. To explore whether the voxels activated across modalities within the TPOJ1 region differentiated between modalities on a more fine-grained level, we employed a multivariate approach and found high decoding accuracy for distinguishing visual and auditory symbols. This decoding accuracy stayed high also after removing voxels that showed stronger responses to one modality than the other. Our results indicate that the TPOJ1 region integrates both auditory and visual symbols for letters and numbers. Understanding this integration process is crucial, particularly for conditions such as dyslexia, where the formation of cross-modal letter-speech sound representations may be impaired. Materials and Methods Participants The fMRI study included 25 participants recruited from Maastricht University, the Netherlands. Data from 4 participants were excluded due to a dyslexia diagnosis (n=1), not being a native Dutch speaker (n=1), or not completing the entire scanning session (n=2). Data analysis was conducted on 21 participants (aged 19-52 years old, mean = 25.52 ± 7.04 years, 14 female). All participants were right-handed native Dutch speakers, reported no neurological disorders, and had normal or corrected-to-normal vision and normal hearing status. This study was approved by the Ethical Committee of the Faculty of Psychology and Neuroscience at Maastricht University (approval code: ERCPN-OZL205 1703 2019) and performed in accordance with relevant guidelines and regulations. Participants provided written informed consent before participation and received course credits or vouchers as compensation. Stimuli In this experiment, we presented three mono-syllabic visual letters and their corresponding auditory letter names (‘d’- /dee/, ‘v’- /vee/, z’- /zet/) as well as three monosyllabic visual and auditory numbers (‘3’- /drie/, ‘5’ - /vijf/, ‘6’- /zes/). The auditory stimuli were recorded in a soundproof room by a native Dutch female speaker, with a sampling rate of 44100Hz. In order to introduce some acoustic variability for the auditory stimuli, different recordings of each number or letter from the same speaker were used. The recordings were processed through PRAAT (6.0.36) software [Boersma, 2001]. They were bandpass filtered (80-10500Hz), smoothed (30Hz) and resampled to 22050Hz and their duration was equalised to 400 milliseconds. The visual stimuli, both lowercase letters and Arabic numbers, were presented in Arial font, with white text on a black background with a size of 0.9° × 0.9°. Experimental procedure To prevent higher-order cognitive processes often triggered by arithmetic or letter tasks, participants were instructed to focus on the stimuli without providing explicit responses. The fMRI experiment was divided into six runs using a slow event-related design (Fig. 1). Each run consisted of four blocks, each corresponding to one of the four conditions: visual-number (NV), auditory-number (NA), visual-letter (LV), and auditory-letter (LA). Prior to the start of each block, the text on the screen changed to indicate the next condition (e.g., Auditory Number / Visual Number). The sequence of these four conditions was balanced using a Latin square design. Within each block, participants were presented with 12 stimuli, including four repetitions of each of the three letters or numbers. Overall, each participant was presented with 288 stimuli throughout the whole experiment. All stimuli were presented in a pseudo-randomized order to ensure that the same letter or number was not presented consecutively. Each trial lasted 400ms. In the visual conditions, a white fixation cross on a black background was shown between two stimuli, with a jittered interval of 8.6, 10.4, or 12.2 seconds. The stimuli for both modalities were presented during a 500ms silent gap. In contrast, the fixation cross was presented continuously during the auditory conditions, and the sounds were presented during the silent gap between two consecutive scans. fMRI Data Acquisition Functional and anatomical image acquisition was performed on a whole-body 3T Magnetom Prisma scanner (Siemens Medical System) using a 64-channel head coil at the Maastricht Brain Imaging Center. Six functional runs were collected with 2 × 2 × 2mm 3 resolution (TA = 1300ms, TR = 1800ms, TE= 30ms, silent-gap = 500ms, FOV = 224 × 224mm). Each run comprised 373 volumes, with each volume consisting of 60 slices. High-resolution anatomical images (voxel size of 1×1×1mm 3 ) were obtained using a T1-weighted three-dimensional MPRAGE sequence (TR = 1800ms, TE = 30ms, 256 sagittal slices) that was acquired after the third run. Data analysis The data analysis was conducted using fMRIPrep 23.2.1 [32], Matlab R2020b, Freesurfer 7.4.1 [33], AFNI and FSL. (f)MRI preprocessing The functional and anatomical data were preprocessed in the fMRIPrep pipeline. fMRIPrep is a robust and efficient tool that integrates several essential toolboxes for preprocessing [32]. The preprocessing of anatomical MRI data involved skull-stripping and tissue segmentation, followed by spatial normalisation of T1-weighted images onto the MNI152NLin2009cAsym template. Subsequently, Freesurfer 7.4.1 [33] was employed to reconstruct surfaces from the T1-weighted structural images, aligning the individual surfaces with the standard fsaverage surface. The preprocessing of functional MRI data involved head-motion correction by mcflirt in FSL (a linear, rigid-body registration approach), and slice timing correction by 3dTshift in AFNI (reference to the first slice). Following these procedures, functional data were aligned with the T1-weighted anatomical images and projected onto the aligned individual fsaverage surface maps. Then Glasser’s HCP-MMP1.0 atlas [31] was used to divide the whole brain into 181 subregions for each hemisphere Estimation of noise regressors We used GLMdenoise, which is a method for reducing noise in task-based fMRI analyses [34, 35]. It involves identifying spatially correlated noise that may come from physiological, instrumental, motion-related, or neural sources directly from the data. Here, we used GLMdenoise on the original, unsmoothed surface data in order to determine the noise regressors to be used in the subsequent surface GLM (detailed below). These participant-specific regressors were then used as nuisance regressors in a general linear model (GLM). Univariate analyses We first conducted a GLM for each participant using the surface data in AFNI, which included the regressors for the task separated by run, polynomials to remove low frequency drift, and the regressors estimated by GLMdenoise. The output of this GLM is, for each participant, a set of per-run betas for each condition (LA/LV/NA/NV). To correct for multiple comparisons in the univariate analyses, a surface-based Monte Carlo simulation approach was employed. Specifically, residuals from a GLM based on unsmoothed data were utilized to estimate the smoothness of each participant’s data. Subsequently, the surface-projected time series data for each participant were spatially smoothed to achieve a total smoothness of 3mm Full Width at Half Maximum (FWHM). This smoothed surface data was then reanalyzed through a second, identical GLM to estimate the per-run responses to auditory and visual conditions. For each hemisphere in fsaverage space, 2500 iterations of simulated noise, smoothed to 3mm, were conducted to establish the null distribution of cluster sizes. The critical value (cluster area) required to achieve pFWE<0.05 was selected from these distributions and applied to the results of the GLM derived from the smoothed data. For the multivariate analysis we used the unsmoothed data (detailed below). Reliability analysis To assess the stability of voxel-level BOLD responses, we calculated the reliability of voxel-level beta estimates within all 181- subregions under auditory and visual conditions. First, the beta values corresponding to voxels within each subregion from both hemispheres (e.g. the beta value from left A1 + right A1) were extracted. Then for each condition (auditory: Number Auditory [NA] & Letter Auditory [LA], visual: Number Visual [NV] & Letter Visual [LV]), trials (each single stimulus has a value in each run, in total 36 values for each condition) were randomly divided into two subsets, and the correlation between these two subsets was examined. This process was repeated 1000 times, and the mean correlation coefficient was obtained as the individual reliability for that condition within each subregion, subsequently averaged across participants. Furthermore, the reliabilities were ranked for each subregion within condition, and the top five subregions with the most reliable voxel-level responses were selected. Group Univariate Analysis The single participant betas from the final smoothed model for each modality and condition (Aud/Vis, Letter/Number,) were analyzed at the group level using one sample t-tests. The vertex-level significance threshold was set at p < 0.01, with any cluster area larger than 79mm 2 corresponding to a cluster corrected pFWE <0.05 threshold as determined by the Monte Carlo simulations detailed above. To examine category-specific regions, we conducted t-tests comparing responses to letters versus numbers within each modality. The activation maps for the two modalities subsequently spatially overlapped to define the regions activated in both modalities. To precisely define the regions of interest (ROIs) the percentage of the overlapping voxels was calculated for 181 subregions the right hemisphere where the overlap located. For each subregion, the voxels activated for both modalities were extracted, and their count was divided by the total number of voxels in that subregion. The subregions were then ranked, and the top five subregions exhibiting the greatest overlap percentage were selected. To minimize potential bias arising from modality-specific activation patterns within these ROIs, a vertex-based t-test was performed to assess whether any portion of the overlapping regions exhibited significantly stronger activation in one modality relative to the other. Multivariate analysis After defining the ROIs that were activated in both modalities from the univariate result, a further multivariate analysis was applied to see if the auditory and visual co-activated (overlapped) voxels within these ROIs present the same pattern of activity across the different modalities. For each participant and each ROI, the beta values for auditory and visual stimuli were extracted from the co-activated voxels for further analysis. By using leave-one-out and cross-validation, a model was trained to discriminate between the response patterns associated with auditory and visual conditions using five runs of data, and subsequently testing on the left-out run. The average accuracy across the six permutations was taken as the participant’s accuracy for this subregion. Subsequently, a t-test with accuracy values for all participants against the two-class chance level of 0.5 was conducted. A significance threshold was established at p < 0.0167 (calculated as 0.5/3 ≈ 0.0167), since we selected 3 ROIs from the right hemisphere. The mean accuracy across participants was regarded as the accuracy for each ROI. To confirm that the MVPA result was not driven by modality-related activation preference, we conducted MVPA for the co-activated voxels, restricting it to the voxels that were indistinguishable (not significantly different) across modalities. Results Reliability of BOLD responses across conditions To assess overall data quality, we first conducted reliability analyses across trials to identify the regions showing the most consistent responses to the auditory and visual letters and numbers. As illustrated in Figure 2a , the temporal regions demonstrate a high split-half correlation of voxel-level beta estimates for the auditory stimuli. The reliabilities of each subregion were ranked, leading to the selection of the top five subregions, which included A5 (Auditory 5 Complex), A4 (Auditory 4 Complex)，PBelt (ParaBelt Complex), LBelt (Lateral Belt Complex), and TPOJ1 (the nomenclature of these and the other subregions follow that of the Glasser atlas). For the visual stimuli, Figure 2b shows that the posterior occipitotemporal regions exhibit a significant split-half correlation. The top five subregions identified as having the highest voxel-level reliability in response to visual stimuli are V4t (Area V4t), LO2 (Lateral Occipital 2), PIT (Posterior InferoTemporal complex), PH (Area PH), and FST (Area FST). GLM results The GLM results for both auditory and visual modalities ( Figure 3 ) are in alignment with the reliability findings. Specifically, the temporal region exhibits significant activation in response to auditory stimuli, while the ventral occipitotemporal region demonstrates significant activation in response to visual stimuli, with a statistical threshold set at vertex p < 0.01 and cluster-level family-wise error (pFWE) < 0.05. Notably, the auditory and visual activation maps intersect in the right posterior temporal region and the bilateral V1 periphery region. Here we focus on TPOJ1 as a critical region for audio-visual integration. Based on the percentage of activity overlap, the five subregions with the highest overlap identified are the right hemisphere TPOJ1, STV (Superior Temporal Visual Area), PSL (PeriSylvian Language Area), STSvp (Area STSv posterior), and STSdp (Area STSd posterior). A vertex-based t-test between auditory versus visual responses found that a small portion of the overlapping voxels (9.44%) in the right TPOJ1 was significantly more active during auditory presentation (p < 0.01) and the remaining overlapping voxels showed no significant difference between modalities ( Figure 4 ). The same analysis comparing responses to letters versus numbers did not reveal any regions showing significantly stronger activation for either category (p > 0.01) in either the auditory or visual modality. Multivariate analysis Following the identification of overlapping regions across both modalities, we performed a classification analysis to determine whether the auditory and visual co-activated voxels in the TPOJ1 — despite comparable percentage of activity in both modalities— as well as in the STV and PSL that also exhibited a relatively larger portion (above 30%) of similarly co-activated voxels, nonetheless retained modality-specific information. This test is important because overlap in univariate activation does not necessarily imply shared neural representations; regions may respond to both modalities yet encode them in distinguishable activity patterns. Results showed that classification accuracy of auditory versus visual conditions for the co-activated voxels in TPOJ1 (63.89%) is significantly above 50% (chance level, p < 0.0167, Bonferroni-corrected for the selected 3 overlapping regions in the right hemisphere). Given that some of the co-activated voxels exhibited stronger auditory responses, it was important to determine whether the decoding results reflected more than merely these univariate biases. Therefore, we removed auditory-dominant voxels within TPOJ1 and examined whether classification accuracy persisted. Also after removing these voxels, the classification accuracy remained significantly above chance level (63.69%). Similarly, also in the STV (59.59%) and PSL (62.76%) the decoding accuracies were significantly above chance level both before and after removing the auditory-dominant voxels. Discussion In this experiment we examined how the brain processes and integrates symbolic representations underlying literacy and numeracy across auditory and visual modalities. Based on previous work we expected that areas at the junction of the visual and auditory streams, the TPOJ, would serve as a key site for this cross-modal integration. In line with our expectations, we replicated overlapping univariate responses to both modalities in the right TPOJ1 and neighboring regions. However, using multivoxel pattern analysis we also found that the patterns of activity in the right TPOJ1 and neighboring regions clearly distinguished between auditory and visual inputs. We conclude that while the overall response may appear shared, the pattern of responses in the TPOJ is not. In our study, passive listening to and viewing of very short stimuli elicited highly reliable auditory and visual cortical activity, indicating stable voxel-level responses across trials. Modality specific activation of the STC and the ventral occipitotemporal cortex aligns with previous neuroimaging findings on auditory and visual letter [5, 20, 36, 37] and number [16, 38, 39] processing. Results from the univariate analysis revealed that the visual and auditory symbols elicited overlapping activation in the right posterior STC, particularly within the right TPOJ1 which exhibited the highest proportion of co-activated voxels. The overlapping responses point to the TPOJ1 as a key site where the brain transforms separate sensory inputs into the shared codes that support reading and number processing. The t-test results indicated that a small proportion (9.44%) of TPOJ1 co-activated voxels showed significantly greater activation in the auditory condition. This suggests that a subset of the co-activated voxels might be more sensitive to the auditory stimuli. However, this finding should be interpreted with caution, as both the univariate analysis and the t-test were conducted at the group level. Group-level analyses may be influenced by inter-individual variability in brain anatomy; consequently, the anatomical boundaries of TPOJ1 and the co-activated voxels may vary across participants and may not precisely correspond to those defined by the atlas used in this study. Because univariate analysis averages responses across voxels to obtain a mean regional activation level, it cannot determine whether the apparent similarity in TPOJ1 activity to auditory and visual symbols reflects a shared activation pattern or different patterns of relative activity levels across voxels. We addressed this question using multivoxel pattern analysis. In this analysis, the co-activated voxels in TPOJ1 exhibited above-chance accuracy in distinguishing response patterns elicited by auditory versus visual symbols, even when considering only those voxels that were statistically indistinguishable across modalities in the univariate t-test. Despite variations in terminology across studies, these findings are consistent with previous evidence of audiovisual integration for symbols within the TPOJ1 region in adults [5, 24, 40] as well as in 6-7 year-old beginning readers [27]. Moreover, previous studies have shown that, compared with typical readers, individuals with dyslexia—who often experience difficulties in linking auditory and visual stimuli—exhibit a weaker activation in the TPOJ1 during audiovisual integration of letters [41], or show a lack of multisensory enhancement for congruent as compared to incongruent letter-speech sound pairs [22, 23]. Other studies have shown that the TPOJ may represent a more general multimodal region [42, 43], associated with various high-level functions, such as language, calculation, visuo-spatial recognition, working memory, and face and object recognition [29, 44-52]. We observed that the overlap of activation across modalities was restricted to the right hemisphere. Others showed that audiovisual integration takes place in both hemispheres [5, 20, 36, 53] or only the left hemisphere [40, 41, 54]. Audiovisual letter processing is typically found to be relatively left lateralized in expert readers [36, 55, 56]. In 8-11 year-old typical readers, the activity of the left STG in response to letters paired with ambiguous speech sounds exhibited a non-linear developmental trajectory across longitudinal sessions [57]. As further discussed below, the right lateralization of the overlapping responses in our study may relate to the specific type of stimuli and paradigm employed. Another unexpected result was the absence of category specific activation for letters and numbers. Previous literature suggests that the ventral occipito-temporal cortex is particularly sensitive to strings of letters, especially for meaningful words [12-14], while the intraparietal sulcus (IPS) is a region that is closely associated with calculation or sequential processing of numbers [16, 28-30, 38, 39]. Our results did not reveal selective activation of these regions for letters or numbers in either modality. The right lateralization and the absence of category-specific regions may be attributed to specific requirements of the experimental tasks, that prompt participants to engage specific perceptual and/or cognitive processing strategies. For example, there is variability in the lateralisation of the language network while being engaged in different language sub-processes [58]. In our paradigm, the stimuli were presented unimodally in separate blocks and participants performed a passive task without behavioral response requirement. As a result, the task likely did not elicit higher-level cognitive or (verbal) working memory processing but instead maintained participants at a primarily perceptual level of processing. Instead, in previous literature, auditory and visual stimuli were often presented simultaneously [5, 40]. Moreover, participants were typically required to perform various linguistic or perceptual tasks, such as target symbol detection [36, 53], syllable identification [54], or audiovisual recalibration [41]. These tasks may engage participants in phonological identification or semantic processing of the presented letter/speech sound stimuli. This finding aligns with evidence for experience dependent plasticity in the lateralization of brain activity during letter-sound integration. Beck et al. (2023) compared the multisensory letter-sound integration for sighted and blind participants. They reported that blind participants exhibited a congruency effect predominantly in the right STC. This suggests that, in the lack of experiencing visual stimuli, visually impaired participants may rely less on the typical left-hemisphere language network. The absence of category specific activation may be due to the fact that both letters and numbers were perceived primarily at the perceptual level. Thus, under brief presentation and in a non-linguistic and non-calculation task, both numbers and letters may be processed primarily as more general-level symbolic representations, directly linking sounds and visual forms. Under these conditions, participants may have processed the letters and numbers primarily as abstract symbols rather than as specific concepts directly associated with quantity, phonology or semantics. Conclusion In this study, we presented unimodal letters and numbers and applied both univariate and multivariate analyses to investigate the modality-dependent and cross-modal cortical processing of spoken and written alphanumeric symbols. Our results revealed overlapping right TPOJ1 responses for auditory and visual stimuli, with comparable response amplitudes for both modalities. A classification analysis demonstrated reliably distinguishable patterns of activation for auditory and visual stimuli across these overlapping voxels, also when excluding a subsample of auditory-dominant voxels. The results imply that the TPOJ1 may play a pivotal role in mediating visual–auditory integration. Future research could employ diverse task paradigms to further delineate the neural substrates underlying the representation of abstract symbolic concepts for numeracy and literacy. Declarations Acknowledgements We thank Daniel Ansari for his advice during the conceptualization of the experiment and Agustin Lage-Castellanos for his involvement in the initial analysis steps. We thank all the participants for their involvement in this study. Author Contribution Zhiwei Chen: Data collection, formal analysis and investigation, original draft preparation, review and editing; Jan W. Kurzawski: Methodology, data collection, formal analysis and investigation, original draft preparation, review and editing, supervision; Logan T. Dowdle: Methodology, data collection, formal analysis and investigation, original draft preparation, review and editing, supervision; Francesco Gentile: Conceptualization, data collection, supervision; Dora Gozukara: Data collection; Milene Bonte: Conceptualization, Methodology, original draft preparation, review and editing, supervision. Funding This work was supported by Maastricht Brain Imaging Center (MBIC) funding, the Netherlands Organization for Scientific Research (NWO, Vidi 452-16-004 and Vici #VI.C.221.025 to MB), and the China Scholarship Council (CSC 202107720051 to ZC). Data availability The datasets used and/or analyses during the current study are available from the corresponding author on reasonable request. Competing interests The authors declare no competing interests. References Duncan, G.J., et al., School readiness and later achievement. Developmental psychology, 2007. 43 (6): p. 1428. Cunha, F., et al., Interpreting the evidence on life cycle skill formation. Handbook of the Economics of Education, 2006. 1 : p. 697-812. Entwisle, D., et al., First Grade and Educational Attainment by Age 22: A New Story. American Journal of Sociology, 2005. 110 (5): p. 1458-1502. Liberman, A.M., The relation of speech to reading and writing , in Speech and reading . 2017, Routledge. p. 17-31. van Atteveldt, N., et al., Integration of letters and speech sounds in the human brain. Neuron, 2004. 43 (2): p. 271. Caravolas, M., et al., Common patterns of prediction of literacy development in different alphabetic orthographies. Psychological science, 2012. 23 (6): p. 678-686. Blomert, L., The neural signature of orthographic–phonological binding in successful and failing reading development. Neuroimage, 2011. 57 (3): p. 695-703. Sasanguie, D. and B. Reynvoet, Adults' arithmetic builds on fast and automatic processing of arabic digits: Evidence from an audiovisual matching paradigm. PloS one, 2014. 9 (2): p. e87739. Binder, J.R., et al., Human temporal lobe activation by speech and nonspeech sounds. Cerebral cortex, 2000. 10 (5): p. 512-528. Dehaene, S., et al., Illiterate to literate: behavioural and cerebral changes induced by reading acquisition. Nature Reviews Neuroscience, 2015. 16 (4): p. 234-244. Yi, H.G., M.K. Leonard, and E.F. Chang, The encoding of speech sounds in the superior temporal gyrus. Neuron, 2019. 102 (6): p. 1096-1110. Cohen, L., et al., Language-specific tuning of visual cortex? Functional properties of the Visual Word Form Area. Brain, 2002. 125 (Pt 5): p. 1054-69. Cohen, L., et al., Visual word recognition in the left and right hemispheres: anatomical and functional correlates of peripheral alexias. Cerebral cortex, 2003. 13 (12): p. 1313-1333. Dehaene, S., et al., The visual word form area: a prelexical representation of visual words in the fusiform gyrus. Neuroreport, 2002. 13 (3): p. 321-325. Dehaene, S. and L. Cohen, Number Processing. Mathematical cognition, 1996. 1 (1): p. 83-120. Eger, E., et al., A supramodal number representation in human intraparietal cortex. Neuron, 2003. 37 (4): p. 719-726. Shum, J., et al., A brain area for visual numerals. Journal of Neuroscience, 2013. 33 (16): p. 6709-6715. Yeo, D.J., et al., The “Inferior Temporal Numeral Area” distinguishes numerals from other character categories during passive viewing: A representational similarity analysis. Neuroimage, 2020. 214 : p. 116716. Beauchamp, M.S., et al., Unraveling multisensory integration: patchy organization within human STS multisensory cortex. Nature neuroscience, 2004. 7 (11): p. 1190-1192. Beck, J., G. Dzięgiel-Fivet, and K. Jednoróg, Similarities and differences in the neural correlates of letter and speech sound integration in blind and sighted readers. NeuroImage, 2023. 278 : p. 120296. Gao, C., et al., Audiovisual integration in the human brain: a coordinate-based meta-analysis. Cerebral Cortex, 2023. 33 (9): p. 5574-5584. Blau, V., et al., Deviant processing of letters and speech sounds as proximate cause of reading failure: a functional magnetic resonance imaging study of dyslexic children. Brain, 2010. 133 (3): p. 868-879. Blau, V., et al., Reduced neural integration of letters and speech sounds links phonological and reading deficits in adult dyslexia. Current biology, 2009. 19 (6): p. 503-508. Raij, T., K. Uutela, and R. Hari, Audiovisual integration of letters in the human brain. Neuron, 2000. 28 (2): p. 617-625. Karipidis, I.I., et al., Developmental trajectories of letter and speech sound integration during reading acquisition. Frontiers in psychology, 2021. 12 : p. 750491. Wang, F., et al., Development of print-speech integration in the brain of beginning readers with varying reading skills. Frontiers in human neuroscience, 2020. 14 : p. 289. I. Karipidis, I., et al., Neural initialization of audiovisual integration in prereaders at varying risk for developmental dyslexia. Human Brain Mapping, 2017. 38 (2): p. 1038-1055. Holloway, I.D., et al., Semantic and perceptual processing of number symbols: evidence from a cross-linguistic fMRI adaptation study. Journal of cognitive neuroscience, 2013. 25 (3): p. 388-400. Holloway, I.D., G.R. Price, and D. Ansari, Common and segregated neural pathways for the processing of symbolic and nonsymbolic numerical magnitude: An fMRI study. Neuroimage, 2010. 49 (1): p. 1006-1017. Pinel, P., et al., Modulation of parietal activation by semantic distance in a number comparison task. Neuroimage, 2001. 14 (5): p. 1013-1026. Glasser, M.F., et al., A multi-modal parcellation of human cerebral cortex. Nature, 2016. 536 (7615): p. 171-178. Esteban, O., et al., fMRIPrep: a robust preprocessing pipeline for functional MRI. Nature methods, 2019. 16 (1): p. 111-116. Fischl, B., FreeSurfer. Neuroimage, 2012. 62 (2): p. 774-781. Charest, I., N. Kriegeskorte, and K.N. Kay, GLMdenoise improves multivariate pattern analysis of fMRI data. NeuroImage, 2018. 183 : p. 606-616. Kay, K.N., et al., GLMdenoise: a fast, automated technique for denoising task-based fMRI data. Frontiers in neuroscience, 2013. 7 : p. 247. Raij, T., K. Uutela, and R. Hari, Audiovisual integration of letters in the human brain. Neuron, 2000. 28 (2): p. 617-25. Rothlein, D. and B. Rapp, The similarity structure of distributed neural responses reveals the multiple representations of letters. NeuroImage, 2014. 89 : p. 331-44. Dehaene, S., Varieties of numerical abilities. Cognition, 1992. 44 (1-2): p. 1-42. Dehaene, S. and L. Cohen, Towards an anatomical and functional model of number processing. Mathematical cognition, 1995. 1 (1): p. 83-120. Calvert, G.A., R. Campbell, and M.J. Brammer, Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current biology : CB, 2000. 10 (11): p. 649-57. Romanovska, L., R. Janssen, and M. Bonte, Cortical responses to letters and ambiguous speech vary with reading skills in dyslexic and typically reading children. NeuroImage: Clinical, 2021. 30 : p. 102588. Biceroglu, H. and A. Karadag, Neuroanatomical aspects of the temporo-parieto-occipital junction and new surgical strategy to preserve the associated tracts in junctional lesion surgery: fiber separation technique. Turk Neurosurg, 2019. 29 (6): p. 864-874. De Benedictis, A., et al., Anatomo ‐functional study of the temporo ‐parieto ‐occipital region: dissection, tractographic and brain mapping evidence from a neurosurgical perspective. Journal of anatomy, 2014. 225 (2): p. 132-151. Deprez, S., et al., The functional neuroanatomy of multitasking: combining dual tasking with a short term memory task. Neuropsychologia, 2013. 51 (11): p. 2251-2260. Duffau, H., M.T. De Schotten, and E. Mandonnet, White matter functional connectivity as an additional landmark for dominant temporal lobectomy. Journal of Neurology, Neurosurgery & Psychiatry, 2008. 79 (5): p. 492-495. Duffau, H., et al., Intra-operative mapping of the subcortical visual pathways using direct electrical stimulations. Acta Neurochirurgica, 2004. 146 (3). Fehr, T., C. Code, and M. Herrmann, Common brain regions underlying different arithmetic operations as revealed by conjunct fMRI–BOLD activation. Brain research, 2007. 1172 : p. 93-102. Rosenberg-Lee, M., et al., Functional dissociations between four basic arithmetic operations in the human posterior parietal cortex: a cytoarchitectonic mapping study. Neuropsychologia, 2011. 49 (9): p. 2592-2608. Sakurai, Y., M. Asami, and T. Mannen, Alexia and agraphia with lesions of the angular and supramarginal gyri: evidence for the disruption of sequential processing. Journal of the neurological sciences, 2010. 288 (1-2): p. 25-33. Tavor, I., et al., Separate parts of occipito-temporal white matter fibers are associated with recognition of faces and places. Neuroimage, 2014. 86 : p. 123-130. Zhen, Z., H. Fang, and J. Liu, The hierarchical brain network for face recognition. PloS one, 2013. 8 (3): p. e59886. Ojemann, G.A., The neurobiology of language and verbal memory: observations from awake neurosurgery. International Journal of Psychophysiology, 2003. 48 (2): p. 141-146. Blau, V., et al., Deviant processing of letters and speech sounds as proximate cause of reading failure: a functional magnetic resonance imaging study of dyslexic children. Brain, 2010. 133 (3): p. 868-879. Sekiyama, K., et al., Auditory-visual speech perception examined by fMRI and PET. Neuroscience Research, 2003. 47 (3): p. 277-287. Xu, W., et al., Rapid changes in brain activity during learning of grapheme-phoneme associations in adults. NeuroImage, 220, Article 117058 . 2020. Xu, W., et al., Audiovisual processing of Chinese characters elicits suppression and congruency effects in MEG. Frontiers in Human Neuroscience, 2019. 13 : p. 18. Romanovska, L., R. Janssen, and M. Bonte, Longitudinal changes in cortical responses to letter-speech sound stimuli in 8–11 year-old children. npj Science of Learning, 2022. 7 (1): p. 2. Bonandrini, R., E. Gornetti, and E. Paulesu, A meta-analytical account of the functional lateralization of the reading network. cortex, 2024. 177 : p. 363-384. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 09 Apr, 2026 Reviewers agreed at journal 12 Feb, 2026 Reviewers invited by journal 09 Feb, 2026 Editor assigned by journal 09 Feb, 2026 Editor invited by journal 03 Feb, 2026 Submission checks completed at journal 02 Feb, 2026 First submitted to journal 02 Feb, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8699439","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":584273512,"identity":"506ff144-b26d-46b7-9c88-2fb674343b87","order_by":0,"name":"Zhiwei Chen","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABTklEQVRIie3NsUrDQBjA8TsOkuWj80nUvMKVQB0sBt8kR6BdUql0U8GUwLmcuhYq+AqubpZAuwRdCx2qdHVQAhIhiL1UakpUcBPJnyPH3cfvglBZ2V+MYh8hli18v7jCPlGblh2Y+hD/G0KyMWTEWRKWPbJCPvb5RKMLMseKfA5WidkPBH1qI3NLPx0ejI92bNsIu/F+Eu5VqDt8Ttop941unuCLgVjrMVS9lreNiTd0uazwwOg5YUejDbcPjHF/fZAnhHIxAYYcNvZqE08jDgAWBJyQC4gsghShPE80RVJFpo+1jvd2bOfIXYyTIgFFUPYXqJGWCLFcEl0SBEVCKT95lYxWryLPMlpnIy4jHBjQaM6JsAgwyxKrxOw1ZyxJ6yYbRdXYezm0dak/xFDf5pcBmeEk3dw4p+4NKkSLV8t2xQ/Dr7N/LcrKysr+We/A8HQnPTcBkQAAAABJRU5ErkJggg==","orcid":"","institution":"Maastricht University","correspondingAuthor":true,"prefix":"","firstName":"Zhiwei","middleName":"","lastName":"Chen","suffix":""},{"id":584273513,"identity":"2d7a889c-47ff-493b-b372-050ee9a0bbc7","order_by":1,"name":"Jan Kurzawski","email":"","orcid":"","institution":"Maastricht University","correspondingAuthor":false,"prefix":"","firstName":"Jan","middleName":"","lastName":"Kurzawski","suffix":""},{"id":584273514,"identity":"e2ae2f39-89ca-4013-be10-2d61eec95709","order_by":2,"name":"Logan Dowdle","email":"","orcid":"","institution":"Maastricht University","correspondingAuthor":false,"prefix":"","firstName":"Logan","middleName":"","lastName":"Dowdle","suffix":""},{"id":584273515,"identity":"626794b8-dbce-4ed3-90c3-f99fb0768262","order_by":3,"name":"Francesco Gentile","email":"","orcid":"","institution":"Maastricht University","correspondingAuthor":false,"prefix":"","firstName":"Francesco","middleName":"","lastName":"Gentile","suffix":""},{"id":584273516,"identity":"7b0e159a-bb82-42ee-b3a1-9bd702d5b200","order_by":4,"name":"Dora Gozukara","email":"","orcid":"","institution":"Radboud University Nijmegen","correspondingAuthor":false,"prefix":"","firstName":"Dora","middleName":"","lastName":"Gozukara","suffix":""},{"id":584273517,"identity":"7f8bcf1c-f4e5-45f8-b8ee-e509603981f3","order_by":5,"name":"Milene Bonte","email":"","orcid":"","institution":"Maastricht University","correspondingAuthor":false,"prefix":"","firstName":"Milene","middleName":"","lastName":"Bonte","suffix":""}],"badges":[],"createdAt":"2026-01-26 10:54:43","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8699439/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8699439/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":101792692,"identity":"9548eff0-deb8-46ba-8294-71a6a3ad2aaf","added_by":"auto","created_at":"2026-02-03 16:14:28","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":100458,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe fMRI paradigm.\u003c/strong\u003e The experiment consisted of 6 runs, each containing 4 blocks. Each block contained 12 stimuli from one of the four conditions (NV/NA/LV/LA). Each stimulus was presented for 400ms. There was an 8.6 - 12.2 seconds jittering between each two stimuli. The fixation was presented between two stimuli in the visual conditions and was constantly presented in the auditory condition. TA = 1300ms, TR = 1800ms, TE = 30ms, silent-gap = 500ms.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8699439/v1/fd324cb7734a57acd0bfce24.png"},{"id":101792695,"identity":"9395a5f7-9ebe-4fbe-a4cf-49ad4ba8e19f","added_by":"auto","created_at":"2026-02-03 16:14:28","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":307468,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSplit half reliability of cortical responses to the auditory and visual stimuli \u003c/strong\u003ea) Auditory regions including A5, A4, PBelt and others, along with the multimodal TPOJ1 area exhibit a high split-half reliability (based on Pearson correlation coefficient) of voxel-level responses to auditory letters and numbers. The top five auditory ROIs are plotted in the center at the bottom, showing the mean and standard error of the reliability values across participants (n=21). b) Visual regions including V4, LO2, PIT, PH and FST and others, exhibit a high split-half reliability (correlation) of voxel-level responses to visual letters and numbers. The top five auditory ROIs are plotted at the centre bottom, showing the mean and standard error of reliability values across participants (n=21).\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8699439/v1/accfaa9c7928eb8ec7383208.png"},{"id":101792696,"identity":"819999e7-5f95-4283-b408-36a081534047","added_by":"auto","created_at":"2026-02-03 16:14:28","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":350701,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSpatial overlap between regions activated by the auditory and visual letters and numbers.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eCortical activation elicited by auditory and visual stimuli (vertex p \u0026lt; 0.01, cluster pFWE\u0026lt;0.05). Auditory (pink) and visual (blue) activity overlaps mainly in the right hemisphere, producing the purple regions. The percentage of overlap in the right hemisphere was determined by calculating the ratio of the number of activated voxels in a given condition to the total number of voxels within that ROI. The top five ROIs are shown in the bar plot (inset, top left).\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8699439/v1/a1d1cebdeedd485e9607928d.png"},{"id":101881446,"identity":"de14b4eb-722c-463c-b29b-60ee546eb59e","added_by":"auto","created_at":"2026-02-04 15:12:10","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":280784,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSpatial overlap GLM maps and the t-test map around the TPOJ1 region. \u003c/strong\u003eThe spatial overlap between the auditory and visual responses and relative differences between them (p \u0026lt; 0.01). A small portion of the overlapping auditory and visual voxels in the TPOJ1 showed stronger (orange) activity to the auditory stimuli (p \u0026lt; 0.01) and the remaining overlapping voxels presented no significant difference between modalities.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-8699439/v1/42d04a4b6e30edb105058097.png"},{"id":101943272,"identity":"5e6c6651-17b4-4baa-a2a3-a04077eae8e6","added_by":"auto","created_at":"2026-02-05 09:41:29","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1782445,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8699439/v1/bc09f3b8-ce52-4dae-ad0a-605f2946eb6d.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Cross-modal processing of auditory and visual symbol representations in the temporo-parietal cortex","fulltext":[{"header":"Introduction","content":"\u003cp\u003eNumeracy and literacy are fundamental skills that individuals typically acquire at an early age [1] and have a profound impact on academic achievement [2, 3]. The identification of single letters and numbers forms a foundational step in numeracy and literacy [4, 5] and in both cases involves setting up associations between visual symbols and their corresponding spoken language representations.\u003c/p\u003e\n\u003cp\u003eIn most alphabetic languages, letters correspond to phonemes in their spoken form, although with cross-linguistic variation in the consistency of the correspondences [6]. The ability to form efficient associations between letters and speech sounds is crucial to the development of fluent reading skills [6, 7]. Likewise, the capacity for automatic association of auditory and visual representations of numbers plays an essential role in both everyday functioning and mathematical competence [8]. At the cortical level, auditory speech sound representations are processed in the superior temporal cortex （STC）[9-11]，while the visual recognition of letters engages the left ventral occipito-temporal cortex [12-14]. Although less studied, auditory number words activate regions such as the superior and middle temporal gyri [15, 16], while visual recognition of Arabic numerals primarily involves the ventral occipito-temporal cortex including the posterior inferior temporal gyrus (pITG) [15, 17, 18].\u003c/p\u003e\n\u003cp\u003eThe neural mechanisms underlying the cross-modal integration of auditory and visual letter and number representations remain only partially understood. Previous studies investigating audiovisual (AV) integration have generally presented auditory and visual stimuli concurrently, comparing bimodal responses to unimodal ones (e.g., (AV \u0026gt; V) \u0026cap; (AV \u0026gt; A)) or contrasting congruent versus incongruent stimuli. Across these studies, the superior temporal cortex/gyrus (STC/STG) is consistently observed to be involved in the audiovisual integration process [19-21]. In line with this, a substantial literature has established the STC as critical for integrating letter symbols and speech sounds [5, 22-24]. Accordingly, the STC seems to contribute to the neural representation of letters not only in spoken but also in written form. Consistent with the proposed role of the STC in integrating letter and speech sound information, developmental work demonstrates that early readers show increased STC activation during the processing of congruent as compared to incongruent or unlearned audiovisual letter\u0026ndash;speech sound pairs, and that the magnitude of this enhancement predicts literacy skills [25, 26]. Children with reading problems due to developmental dyslexia instead exhibit a reduced modulatory effect of letter and sound congruency on cortical responses in the planum temporale, Heschl\u0026rsquo;s sulcus and superior temporal sulcus (STS) [22]. Prereaders at-risk for dyslexia exhibited audiovisual integration effects following a brief artificial letter\u0026ndash;speech sound training, with outcomes modulated by individual learning rate [27]. Participants classified as fast learners demonstrated stronger congruency effects for trained artificial letter\u0026ndash;speech sound pairs, particularly in the right superior temporal gyrus (STG) and the left inferior temporal cortex\u0026nbsp;[27].\u003c/p\u003e\n\u003cp\u003eThese findings highlight the STC as a critical hub for multisensory integration of symbolic information. However, it remains unclear whether the STC also plays a comparable role in integrating auditory and visual number representations. In numerical cognition, a large amount of research has found that the intraparietal sulcus (IPS) plays an important role in number quantity processing [16, 28-30]. Some evidence does suggest involvement of the left posterior STG in symbolic number processing [29]. Holloway and colleagues (2010) asked participants to compare the magnitude of both visual symbolic and nonsymbolic numerical stimuli in an fMRI study, and found that symbolic Arabic numbers elicited significantly greater activation in the left posterior STG compared to nonsymbolic stimuli.\u003c/p\u003e\n\u003cp\u003eAlthough previous studies have consistently implicated the posterior STG as a critical region in visual\u0026ndash;auditory integration, the reported anatomical labels and their stereotactic coordinates vary across studies. Based on their reported coordinates, most of these regions align with, or are located near, TPOJ1 (Temporo-Parieto-Occipital Junction 1) as defined in the Glasser atlas [31]. Table 1 lists the anatomical labels and Talairach coordinates of the superior temporal and temporo-parietal regions reported across studies (MNI coordinates were converted to Talairach space). It is apparent that the TPOJ lies at the intersection of auditory and visual processing, however it is likely that this area exhibits distinct modality-specific activation profiles in addition to activity modulations due to audio-visual (in)congruency.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 1.\u0026nbsp;\u003c/strong\u003eTalairach Coordinates of Superior Temporal Integration Sites* Reported from Previous Studies\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"608\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eStudy\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eStimuli\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003eRegion\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"3\"\u003e\n \u003cp\u003e\u003cstrong\u003eTalairach coordinates\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ex\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ey\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ez\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003eGlasser atlas\u0026nbsp;\u003c/p\u003e\n \u003cp\u003e(Glasser et al., 2016)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLeft TPOJ1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[-46, -60]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[-35,-59]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[1,\u0026nbsp;11]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eRight TPOJ1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[43,\u0026nbsp;58]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[-35,\u0026nbsp;-54]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[3,\u0026nbsp;18]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eCalvert et al., 2000\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eAudiovisual speech\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLeft STS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-53\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003eRaij et al., 2000\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003eLetters and speech sounds\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLeft STS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-53\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-31\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eRight STS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-31\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eSekiyama et al., 2003\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eAudiovisual speech\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLeft STS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-56\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-49\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003evan Atteveldt et al., 2004\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003eLetters and speech sounds\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLeft STS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-54\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eRight STG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e52\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-33\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e18\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003eBlau et al., 2010\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003eLetters and speech sounds\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLeft STS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-56\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-33\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eRight STS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e58\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-33\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003eKaripidis et al., 2017\u003c/p\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003eArtificial Letter and speech sounds\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRight STG/MTG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e62\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-47\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eLeft STG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-57\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e17\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\"\u003e\n \u003cp\u003eRomanovska et al., 2021\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"3\"\u003e\n \u003cp\u003eLetters and ambiguous speech sounds\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLeft STG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-58\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-31\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e13\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eLeft STG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-53\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-29\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e14\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eLeft STG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-59\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-30\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e13\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003eBeck et al., 2023\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\"\u003e\n \u003cp\u003eLetters and speech sounds\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLeft STG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-51\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-37\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e16\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eRight STG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e45\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e-37\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e13\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u0026nbsp;* The definition of integration site varied across studies, e.g., peak, centre of mass\u003c/p\u003e\n\u003cp\u003eThe current study aims to explore how the brain processes cross-modal symbols, minimising potential task-related or attentional biases. We presented single letters and numbers in visual or auditory modalities and asked participants to view/listen to these stimuli without further task requirements. Using a univariate GLM, we observed consistent and overlapping responses to auditory letters and numbers in STC regions, and visual letters and numbers in the ventral visual cortex. These analyses further showed overlapping responses with similar amplitudes for both auditory and visual symbols within TPOJ1. The majority of voxels in TPOJ1 showed no significant difference in activity between the two modalities. To explore whether the voxels activated across modalities within the TPOJ1 region differentiated between modalities on a more fine-grained level, we employed a multivariate approach and found high decoding accuracy for distinguishing visual and auditory symbols. This decoding accuracy stayed high also after removing voxels that showed stronger responses to one modality than the other. Our results indicate that the TPOJ1 region integrates both auditory and visual symbols for letters and numbers. Understanding this integration process is crucial, particularly for conditions such as dyslexia, where the formation of cross-modal letter-speech sound representations may be impaired.\u0026nbsp;\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003e\u003cstrong\u003eParticipants\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe fMRI study included 25 participants recruited from Maastricht University, the Netherlands. Data from 4 participants were excluded due to a dyslexia diagnosis (n=1), not being a native Dutch speaker (n=1), or not completing the entire scanning session (n=2). Data analysis was conducted on 21 participants (aged 19-52 years old, mean = 25.52 \u0026plusmn; 7.04 years, 14 female). All participants were right-handed native Dutch speakers, reported no neurological disorders, and had normal or corrected-to-normal vision and normal hearing status.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThis study was approved by the Ethical Committee of the Faculty of Psychology and Neuroscience at Maastricht University (approval code: ERCPN-OZL205 1703 2019) and performed in accordance with relevant guidelines and regulations. Participants provided written informed consent before participation and received course credits or vouchers as compensation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStimuli\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn this experiment, we presented three mono-syllabic visual letters and their corresponding auditory letter names (\u0026lsquo;d\u0026rsquo;- /dee/, \u0026lsquo;v\u0026rsquo;- /vee/, z\u0026rsquo;- /zet/) as well as three monosyllabic visual and auditory numbers (\u0026lsquo;3\u0026rsquo;- /drie/, \u0026lsquo;5\u0026rsquo; - /vijf/, \u0026lsquo;6\u0026rsquo;- /zes/). The auditory stimuli were recorded in a soundproof room by a native Dutch female speaker, with a sampling rate of 44100Hz. In order to introduce some acoustic variability for the auditory stimuli, different recordings of each number or letter from the same speaker were used. The recordings were processed through PRAAT (6.0.36) software [Boersma, 2001]. They were bandpass filtered (80-10500Hz), smoothed (30Hz) and resampled to 22050Hz and their duration was equalised to 400 milliseconds. The visual stimuli, both lowercase letters and Arabic numbers, were presented in Arial font, with white text on a black background with a size of 0.9\u0026deg; \u0026times; 0.9\u0026deg;.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u0026nbsp;Experimental procedure\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo prevent higher-order cognitive processes often triggered by arithmetic or letter tasks, participants were instructed to focus on the stimuli without providing explicit responses. The fMRI experiment was divided into six runs using a slow event-related design (Fig. 1). Each run consisted of four blocks, each corresponding to one of the four conditions: visual-number (NV), auditory-number (NA), visual-letter (LV), and auditory-letter (LA). Prior to the start of each block, the text on the screen changed to indicate the next condition (e.g., Auditory Number / Visual Number). The sequence of these four conditions was balanced using a Latin square design. Within each block, participants were presented with 12 stimuli, including four repetitions of each of the three letters or numbers. Overall, each participant was presented with 288 stimuli throughout the whole experiment. All stimuli were presented in a pseudo-randomized order to ensure that the same letter or number was not presented consecutively. Each trial lasted 400ms. In the visual conditions, a white fixation cross on a black background was shown between two stimuli, with a jittered interval of 8.6, 10.4, or 12.2 seconds. The stimuli for both modalities were presented during a 500ms silent gap. In contrast, the fixation cross was presented continuously during the auditory conditions, and the sounds were presented during the silent gap between two consecutive scans.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003efMRI Data Acquisition\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFunctional and anatomical image acquisition was performed on a whole-body 3T Magnetom Prisma scanner (Siemens Medical System) using a 64-channel head coil at the Maastricht Brain Imaging Center. Six functional runs were collected with 2 \u0026times; 2 \u0026times; 2mm\u003csup\u003e3\u003c/sup\u003e resolution (TA = 1300ms, TR = 1800ms, TE= 30ms, silent-gap = 500ms, FOV = 224 \u0026times; 224mm). Each run comprised 373 volumes, with each volume consisting of 60 slices. High-resolution anatomical images (voxel size of 1\u0026times;1\u0026times;1mm\u003csup\u003e3\u003c/sup\u003e) were obtained using a T1-weighted three-dimensional MPRAGE sequence (TR = 1800ms, TE = 30ms, 256 sagittal slices) that was acquired after the third run.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData analysis\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe data analysis was conducted using fMRIPrep 23.2.1 [32], Matlab R2020b, Freesurfer 7.4.1 [33], AFNI and FSL.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(f)MRI preprocessing\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe functional and anatomical data were preprocessed in the fMRIPrep pipeline. fMRIPrep is a robust and efficient tool that integrates several essential toolboxes for preprocessing [32].\u003c/p\u003e\n\u003cp\u003eThe preprocessing of anatomical MRI data involved skull-stripping and tissue segmentation, followed by spatial normalisation of T1-weighted images onto the MNI152NLin2009cAsym template. Subsequently, Freesurfer 7.4.1 [33] was employed to reconstruct surfaces from the T1-weighted structural images, aligning the individual surfaces with the standard fsaverage surface. The preprocessing of functional MRI data involved head-motion correction by mcflirt in FSL (a linear, rigid-body registration approach), and slice timing correction by 3dTshift in AFNI (reference to the first slice). Following these procedures, functional data were aligned with the T1-weighted anatomical images and projected onto the aligned individual fsaverage surface maps. Then Glasser\u0026rsquo;s HCP-MMP1.0 atlas [31] was used to divide the whole brain into 181 subregions for each hemisphere\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEstimation of noise regressors\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe used GLMdenoise, which is a method for reducing noise in task-based fMRI analyses [34, 35]. It involves identifying spatially correlated noise that may come from physiological, instrumental, motion-related, or neural sources directly from the data. Here, we used GLMdenoise on the original, unsmoothed surface data in order to determine the noise regressors to be used in the subsequent surface GLM (detailed below). These participant-specific regressors were then used as nuisance regressors in a general linear model (GLM).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eUnivariate analyses\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe first conducted a GLM for each participant using the surface data in AFNI, which included the regressors for the task separated by run, polynomials to remove low frequency drift, and the regressors estimated by GLMdenoise. The output of this GLM is, for each participant, a set of per-run betas for each condition (LA/LV/NA/NV). To correct for multiple comparisons in the univariate analyses, a surface-based Monte Carlo simulation approach was employed. Specifically, residuals from a GLM based on unsmoothed data were utilized to estimate the smoothness of each participant\u0026rsquo;s data. Subsequently, the surface-projected time series data for each participant were spatially smoothed to achieve a total smoothness of 3mm Full Width at Half Maximum (FWHM). This smoothed surface data was then reanalyzed through a second, identical GLM to estimate the per-run responses to auditory and visual conditions. For each hemisphere in fsaverage space, 2500 iterations of simulated noise, smoothed to 3mm, were conducted to establish the null distribution of cluster sizes. The critical value (cluster area) required to achieve pFWE\u0026lt;0.05 was selected from these distributions and applied to the results of the GLM derived from the smoothed data. For the multivariate analysis we used the unsmoothed data (detailed below).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eReliability analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo assess the stability of voxel-level BOLD responses, we calculated the reliability of voxel-level beta estimates within all 181- subregions under auditory and visual conditions. First, the beta values corresponding to voxels within each subregion from both hemispheres (e.g. the beta value from left A1 + right A1) were extracted. Then for each condition (auditory: Number Auditory [NA] \u0026amp; Letter Auditory [LA], visual: Number Visual [NV] \u0026amp; Letter Visual [LV]), trials (each single stimulus has a value in each run, in total 36 values for each condition) were randomly divided into two subsets, and the correlation between these two subsets was examined. This process was repeated 1000 times, and the mean correlation coefficient was obtained as the individual reliability for that condition within each subregion, subsequently averaged across participants. Furthermore, the reliabilities were ranked for each subregion within condition, and the top five subregions with the most reliable voxel-level responses were selected.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGroup Univariate Analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe single participant betas from the final smoothed model for each modality and condition (Aud/Vis, Letter/Number,) were analyzed at the group level using one sample t-tests. The vertex-level significance threshold was set at p \u0026lt; 0.01, with any cluster area larger than 79mm\u003csup\u003e2\u003c/sup\u003e corresponding to a cluster corrected pFWE \u0026lt;0.05 threshold as determined by the Monte Carlo simulations detailed above. To examine category-specific regions, we conducted t-tests comparing responses to letters versus numbers within each modality. The activation maps for the two modalities subsequently spatially overlapped to define the regions activated in both modalities. To precisely define the regions of interest (ROIs) the percentage of the overlapping voxels was calculated for 181 subregions the right hemisphere where the overlap located. For each subregion, the voxels activated for both modalities were extracted, and their count was divided by the total number of voxels in that subregion. The subregions were then ranked, and the top five subregions exhibiting the greatest overlap percentage were selected. To minimize potential bias arising from modality-specific activation patterns within these ROIs, a vertex-based t-test was performed to assess whether any portion of the overlapping regions exhibited significantly stronger activation in one modality relative to the other.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMultivariate analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAfter defining the ROIs that were activated in both modalities from the univariate result, a further multivariate analysis was applied to see if the auditory and visual co-activated (overlapped) voxels within these ROIs present the same pattern of activity across the different modalities. For each participant and each ROI, the beta values for auditory and visual stimuli were extracted from the co-activated voxels for further analysis. By using leave-one-out and cross-validation, a model was trained to discriminate between the response patterns associated with auditory and visual conditions using five runs of data, and subsequently testing on the left-out run. The average accuracy across the six permutations was taken as the participant\u0026rsquo;s accuracy for this subregion. Subsequently, a t-test with accuracy values for all participants against the two-class chance level of 0.5 was conducted. A significance threshold was established at p \u0026lt; 0.0167 (calculated as 0.5/3 \u0026asymp; 0.0167), since we selected 3 ROIs from the right hemisphere. The mean accuracy across participants was regarded as the accuracy for each ROI. To confirm that the MVPA result was not driven by modality-related activation preference, we conducted MVPA for the co-activated voxels, restricting it to the voxels that were indistinguishable (not significantly different) across modalities.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eReliability of BOLD responses across conditions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo assess overall data quality, we first conducted reliability analyses across trials to identify the regions showing the most consistent responses to the auditory and visual letters and numbers. As illustrated in \u003cstrong\u003eFigure 2a\u003c/strong\u003e, the temporal regions demonstrate a high split-half correlation of voxel-level beta estimates for the auditory stimuli. The reliabilities of each subregion were ranked, leading to the selection of the top five subregions, which included A5 (Auditory 5 Complex), A4 (Auditory 4 Complex)，PBelt (ParaBelt Complex), LBelt (Lateral Belt Complex), and TPOJ1 (the nomenclature of these and the other subregions follow that of the Glasser atlas). For the visual stimuli, \u003cstrong\u003eFigure 2b\u003c/strong\u003e shows that the posterior occipitotemporal regions exhibit a significant split-half correlation. The top five subregions identified as having the highest voxel-level reliability in response to visual stimuli are V4t (Area V4t), LO2 (Lateral Occipital 2), PIT (Posterior InferoTemporal complex), PH (Area PH), and FST (Area FST).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eGLM results\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe GLM results for both auditory and visual modalities (\u003cstrong\u003eFigure 3\u003c/strong\u003e) are in alignment with the reliability findings. Specifically, the temporal region exhibits significant activation in response to auditory stimuli, while the ventral occipitotemporal region demonstrates significant activation in response to visual stimuli, with a statistical threshold set at vertex p \u0026lt; 0.01 and cluster-level family-wise error (pFWE) \u0026lt; 0.05. Notably, the auditory and visual activation maps intersect in the right posterior temporal region and the bilateral V1 periphery region. Here we focus on TPOJ1 as a critical region for audio-visual integration. Based on the percentage of activity overlap, the five subregions with the highest overlap identified are the right hemisphere TPOJ1, STV (Superior Temporal Visual Area), PSL (PeriSylvian Language Area), STSvp (Area STSv posterior), and STSdp (Area STSd posterior).\u003c/p\u003e\n\u003cp\u003eA vertex-based t-test between auditory versus visual responses found that a small portion of the overlapping voxels (9.44%) in the right TPOJ1 was significantly more active during auditory presentation (p \u0026lt; 0.01) and the remaining overlapping voxels showed no significant difference between modalities (\u003cstrong\u003eFigure 4\u003c/strong\u003e). The same analysis comparing responses to letters versus numbers did not reveal any regions showing significantly stronger activation for either category (p \u0026gt; 0.01) in either the auditory or visual modality.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMultivariate analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFollowing the identification of overlapping regions across both modalities, we performed a classification analysis to determine whether the auditory and visual co-activated voxels in the TPOJ1 \u0026mdash; despite comparable percentage of activity in both modalities\u0026mdash; as well as in the STV and PSL that also exhibited a relatively larger portion (above 30%) of similarly co-activated voxels, nonetheless retained modality-specific information. This test is important because overlap in univariate activation does not necessarily imply shared neural representations; regions may respond to both modalities yet encode them in distinguishable activity patterns. Results showed that classification accuracy of auditory versus visual conditions for the co-activated voxels in TPOJ1 (63.89%) is significantly above 50% (chance level, p \u0026lt; 0.0167, Bonferroni-corrected for the selected 3 overlapping regions in the right hemisphere). Given that some of the co-activated voxels exhibited stronger auditory responses, it was important to determine whether the decoding results reflected more than merely these univariate biases. Therefore, we removed auditory-dominant voxels within TPOJ1 and examined whether classification accuracy persisted. Also after removing these voxels, the classification accuracy remained significantly above chance level (63.69%). Similarly, also in the STV (59.59%) and PSL (62.76%) the decoding accuracies were significantly above chance level both before and after removing the auditory-dominant voxels.\u003c/p\u003e"},{"header":" Discussion","content":"\u003cp\u003eIn this experiment we examined how the brain processes and integrates symbolic representations underlying literacy and numeracy across auditory and visual modalities. Based on previous work we expected that areas at the junction of the visual and auditory streams, the TPOJ, would serve as a key site for this cross-modal integration. In line with our expectations, we replicated overlapping univariate responses to both modalities in the right TPOJ1 and neighboring regions. However, using multivoxel pattern analysis we also found that the patterns of activity in the right TPOJ1 and neighboring regions clearly distinguished between auditory and visual inputs. We conclude that while the overall response may appear shared, the pattern of responses in the TPOJ is not.\u003c/p\u003e\n\u003cp\u003eIn our study, passive listening to and viewing of very short stimuli elicited highly reliable auditory and visual cortical activity, indicating stable voxel-level responses across trials. Modality specific activation of the STC and the ventral occipitotemporal cortex aligns with previous neuroimaging findings on auditory and visual letter [5, 20, 36, 37] and number [16, 38, 39] processing.\u003c/p\u003e\n\u003cp\u003eResults from the univariate analysis revealed that the visual and auditory symbols elicited overlapping activation in the right posterior STC, particularly within the right TPOJ1 which exhibited the highest proportion of co-activated voxels. The overlapping responses point to the TPOJ1 as a key site where the brain transforms separate sensory inputs into the shared codes that support reading and number processing. The t-test results indicated that a small proportion (9.44%) of TPOJ1 co-activated voxels showed significantly greater activation in the auditory condition. This suggests that a subset of the co-activated voxels might be more sensitive to the auditory stimuli. However, this finding should be interpreted with caution, as both the univariate analysis and the t-test were conducted at the group level. Group-level analyses may be influenced by inter-individual variability in brain anatomy; consequently, the anatomical boundaries of TPOJ1 and the co-activated voxels may vary across participants and may not precisely correspond to those defined by the atlas used in this study.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eBecause univariate analysis averages responses across voxels to obtain a mean regional activation level, it cannot determine whether the apparent similarity in TPOJ1 activity to auditory and visual symbols reflects a shared activation pattern or different patterns of relative activity levels across voxels. We addressed this question using multivoxel pattern analysis. In this analysis, the co-activated voxels in TPOJ1 exhibited above-chance accuracy in distinguishing response patterns elicited by auditory versus visual symbols, even when considering only those voxels that were statistically indistinguishable across modalities in the univariate t-test. Despite variations in terminology across studies, these findings are consistent with previous evidence of audiovisual integration for symbols within the TPOJ1 region in adults [5, 24, 40] as well as in 6-7 year-old beginning readers [27]. Moreover, previous studies have shown that, compared with typical readers, individuals with dyslexia\u0026mdash;who often experience difficulties in linking auditory and visual stimuli\u0026mdash;exhibit a weaker activation in the TPOJ1 during audiovisual integration of letters [41], or show a lack of multisensory enhancement for congruent as compared to incongruent letter-speech sound pairs [22, 23]. Other studies have shown that the TPOJ may represent a more general multimodal region [42, 43], associated with various high-level functions, such as language, calculation, visuo-spatial recognition, working memory, and face and object recognition [29, 44-52].\u003c/p\u003e\n\u003cp\u003eWe observed that the overlap of activation across modalities was restricted to the right hemisphere. Others showed that audiovisual integration takes place in both hemispheres [5, 20, 36, 53] or only the left hemisphere [40, 41, 54]. Audiovisual letter processing is typically found to be relatively left lateralized in expert readers [36, 55, 56]. In 8-11 year-old typical readers, the activity of the left STG in response to letters paired with ambiguous speech sounds exhibited a non-linear developmental trajectory across longitudinal sessions [57]. As further discussed below, the right lateralization of the overlapping responses in our study may relate to the specific type of stimuli and paradigm employed. Another unexpected result was the absence of category specific activation for letters and numbers. Previous literature suggests that the ventral occipito-temporal cortex is particularly sensitive to strings of letters, especially for meaningful words [12-14], while the intraparietal sulcus (IPS) is a region that is closely associated with calculation or sequential processing of numbers [16, 28-30, 38, 39]. Our results did not reveal selective activation of these regions for letters or numbers in either modality. The right lateralization and the absence of category-specific regions may be attributed to specific requirements of the experimental tasks, that prompt participants to engage specific perceptual and/or cognitive processing strategies. For example, there is variability in the lateralisation of the language network while being engaged in different language sub-processes [58]. In our paradigm, the stimuli were presented unimodally in separate blocks and participants performed a passive task without behavioral response requirement. As a result, the task likely did not elicit higher-level cognitive or (verbal) working memory processing but instead maintained participants at a primarily perceptual level of processing. Instead, in previous literature, auditory and visual stimuli were often presented simultaneously [5, 40]. Moreover, participants were typically required to perform various linguistic or perceptual tasks, such as target symbol detection [36, 53], syllable identification [54], or audiovisual recalibration [41]. These tasks may engage participants in phonological identification or semantic processing of the presented letter/speech sound stimuli. This finding aligns with evidence for experience dependent plasticity in the lateralization of brain activity during letter-sound integration. Beck et al. (2023) compared the multisensory letter-sound integration for sighted and blind participants. They reported that blind participants exhibited a congruency effect predominantly in the right STC. This suggests that, in the lack of experiencing visual stimuli, visually impaired participants may rely less on the typical left-hemisphere language network. The absence of category specific activation may be due to the fact that both letters and numbers were perceived primarily at the perceptual level. Thus, under brief presentation and in a non-linguistic and non-calculation task, both numbers and letters may be processed primarily as more general-level symbolic representations, directly linking sounds and visual forms. Under these conditions, participants may have processed the letters and numbers primarily as abstract symbols rather than as specific concepts directly associated with quantity, phonology or semantics.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this study, we presented unimodal letters and numbers and applied both univariate and multivariate analyses to investigate the modality-dependent and cross-modal cortical processing of spoken and written alphanumeric symbols. Our results revealed overlapping right TPOJ1 responses for auditory and visual stimuli, with comparable response amplitudes for both modalities. A classification analysis demonstrated reliably distinguishable patterns of activation for auditory and visual stimuli across these overlapping voxels, also when excluding a subsample of auditory-dominant voxels. The results imply that the TPOJ1 may play a pivotal role in mediating visual\u0026ndash;auditory integration. Future research could employ diverse task paradigms to further delineate the neural substrates underlying the representation of abstract symbolic concepts for numeracy and literacy.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe thank Daniel Ansari for his advice during the conceptualization of the experiment and Agustin Lage-Castellanos for his involvement in the initial analysis steps. We thank all the participants for their involvement in this study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contribution\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eZhiwei Chen: Data collection, formal analysis and investigation, original draft preparation, review and editing; Jan W. Kurzawski: Methodology, data collection, formal analysis and investigation, original draft preparation, review and editing, supervision; Logan T. Dowdle: Methodology, data collection, formal analysis and investigation, original draft preparation, review and editing, supervision; Francesco Gentile: Conceptualization, data collection, supervision; Dora Gozukara: Data collection; Milene Bonte: Conceptualization, Methodology, original draft preparation, review and editing, supervision.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by Maastricht Brain Imaging Center (MBIC) funding, the Netherlands Organization for Scientific Research (NWO, Vidi 452-16-004 and Vici #VI.C.221.025 to MB), and the China Scholarship Council (CSC 202107720051 to ZC).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets used and/or analyses during the current study are available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eDuncan, G.J., et al., \u003cem\u003eSchool readiness and later achievement.\u003c/em\u003e Developmental psychology, 2007. \u003cstrong\u003e43\u003c/strong\u003e(6): p. 1428.\u003c/li\u003e\n\u003cli\u003eCunha, F., et al., \u003cem\u003eInterpreting the evidence on life cycle skill formation.\u003c/em\u003e Handbook of the Economics of Education, 2006. \u003cstrong\u003e1\u003c/strong\u003e: p. 697-812.\u003c/li\u003e\n\u003cli\u003eEntwisle, D., et al., \u003cem\u003eFirst Grade and Educational Attainment by Age 22: A New Story.\u003c/em\u003e American Journal of Sociology, 2005. \u003cstrong\u003e110\u003c/strong\u003e(5): p. 1458-1502.\u003c/li\u003e\n\u003cli\u003eLiberman, A.M., \u003cem\u003eThe relation of speech to reading and writing\u003c/em\u003e, in \u003cem\u003eSpeech and reading\u003c/em\u003e. 2017, Routledge. p. 17-31.\u003c/li\u003e\n\u003cli\u003evan Atteveldt, N., et al., \u003cem\u003eIntegration of letters and speech sounds in the human brain.\u003c/em\u003e Neuron, 2004. \u003cstrong\u003e43\u003c/strong\u003e(2): p. 271.\u003c/li\u003e\n\u003cli\u003eCaravolas, M., et al., \u003cem\u003eCommon patterns of prediction of literacy development in different alphabetic orthographies.\u003c/em\u003e Psychological science, 2012. \u003cstrong\u003e23\u003c/strong\u003e(6): p. 678-686.\u003c/li\u003e\n\u003cli\u003eBlomert, L., \u003cem\u003eThe neural signature of orthographic\u0026ndash;phonological binding in successful and failing reading development.\u003c/em\u003e Neuroimage, 2011. \u003cstrong\u003e57\u003c/strong\u003e(3): p. 695-703.\u003c/li\u003e\n\u003cli\u003eSasanguie, D. and B. Reynvoet, \u003cem\u003eAdults\u0026apos; arithmetic builds on fast and automatic processing of arabic digits: Evidence from an audiovisual matching paradigm.\u003c/em\u003e PloS one, 2014. \u003cstrong\u003e9\u003c/strong\u003e(2): p. e87739.\u003c/li\u003e\n\u003cli\u003eBinder, J.R., et al., \u003cem\u003eHuman temporal lobe activation by speech and nonspeech sounds.\u003c/em\u003e Cerebral cortex, 2000. \u003cstrong\u003e10\u003c/strong\u003e(5): p. 512-528.\u003c/li\u003e\n\u003cli\u003eDehaene, S., et al., \u003cem\u003eIlliterate to literate: behavioural and cerebral changes induced by reading acquisition.\u003c/em\u003e Nature Reviews Neuroscience, 2015. \u003cstrong\u003e16\u003c/strong\u003e(4): p. 234-244.\u003c/li\u003e\n\u003cli\u003eYi, H.G., M.K. Leonard, and E.F. Chang, \u003cem\u003eThe encoding of speech sounds in the superior temporal gyrus.\u003c/em\u003e Neuron, 2019. \u003cstrong\u003e102\u003c/strong\u003e(6): p. 1096-1110.\u003c/li\u003e\n\u003cli\u003eCohen, L., et al., \u003cem\u003eLanguage-specific tuning of visual cortex? Functional properties of the Visual Word Form Area.\u003c/em\u003e Brain, 2002. \u003cstrong\u003e125\u003c/strong\u003e(Pt 5): p. 1054-69.\u003c/li\u003e\n\u003cli\u003eCohen, L., et al., \u003cem\u003eVisual word recognition in the left and right hemispheres: anatomical and functional correlates of peripheral alexias.\u003c/em\u003e Cerebral cortex, 2003. \u003cstrong\u003e13\u003c/strong\u003e(12): p. 1313-1333.\u003c/li\u003e\n\u003cli\u003eDehaene, S., et al., \u003cem\u003eThe visual word form area: a prelexical representation of visual words in the fusiform gyrus.\u003c/em\u003e Neuroreport, 2002. \u003cstrong\u003e13\u003c/strong\u003e(3): p. 321-325.\u003c/li\u003e\n\u003cli\u003eDehaene, S. and L. Cohen, \u003cem\u003eNumber Processing.\u003c/em\u003e Mathematical cognition, 1996. \u003cstrong\u003e1\u003c/strong\u003e(1): p. 83-120.\u003c/li\u003e\n\u003cli\u003eEger, E., et al., \u003cem\u003eA supramodal number representation in human intraparietal cortex.\u003c/em\u003e Neuron, 2003. \u003cstrong\u003e37\u003c/strong\u003e(4): p. 719-726.\u003c/li\u003e\n\u003cli\u003eShum, J., et al., \u003cem\u003eA brain area for visual numerals.\u003c/em\u003e Journal of Neuroscience, 2013. \u003cstrong\u003e33\u003c/strong\u003e(16): p. 6709-6715.\u003c/li\u003e\n\u003cli\u003eYeo, D.J., et al., \u003cem\u003eThe \u0026ldquo;Inferior Temporal Numeral Area\u0026rdquo; distinguishes numerals from other character categories during passive viewing: A representational similarity analysis.\u003c/em\u003e Neuroimage, 2020. \u003cstrong\u003e214\u003c/strong\u003e: p. 116716.\u003c/li\u003e\n\u003cli\u003eBeauchamp, M.S., et al., \u003cem\u003eUnraveling multisensory integration: patchy organization within human STS multisensory cortex.\u003c/em\u003e Nature neuroscience, 2004. \u003cstrong\u003e7\u003c/strong\u003e(11): p. 1190-1192.\u003c/li\u003e\n\u003cli\u003eBeck, J., G. Dzięgiel-Fivet, and K. Jednor\u0026oacute;g, \u003cem\u003eSimilarities and differences in the neural correlates of letter and speech sound integration in blind and sighted readers.\u003c/em\u003e NeuroImage, 2023. \u003cstrong\u003e278\u003c/strong\u003e: p. 120296.\u003c/li\u003e\n\u003cli\u003eGao, C., et al., \u003cem\u003eAudiovisual integration in the human brain: a coordinate-based meta-analysis.\u003c/em\u003e Cerebral Cortex, 2023. \u003cstrong\u003e33\u003c/strong\u003e(9): p. 5574-5584.\u003c/li\u003e\n\u003cli\u003eBlau, V., et al., \u003cem\u003eDeviant processing of letters and speech sounds as proximate cause of reading failure: a functional magnetic resonance imaging study of dyslexic children.\u003c/em\u003e Brain, 2010. \u003cstrong\u003e133\u003c/strong\u003e(3): p. 868-879.\u003c/li\u003e\n\u003cli\u003eBlau, V., et al., \u003cem\u003eReduced neural integration of letters and speech sounds links phonological and reading deficits in adult dyslexia.\u003c/em\u003e Current biology, 2009. \u003cstrong\u003e19\u003c/strong\u003e(6): p. 503-508.\u003c/li\u003e\n\u003cli\u003eRaij, T., K. Uutela, and R. Hari, \u003cem\u003eAudiovisual integration of letters in the human brain.\u003c/em\u003e Neuron, 2000. \u003cstrong\u003e28\u003c/strong\u003e(2): p. 617-625.\u003c/li\u003e\n\u003cli\u003eKaripidis, I.I., et al., \u003cem\u003eDevelopmental trajectories of letter and speech sound integration during reading acquisition.\u003c/em\u003e Frontiers in psychology, 2021. \u003cstrong\u003e12\u003c/strong\u003e: p. 750491.\u003c/li\u003e\n\u003cli\u003eWang, F., et al., \u003cem\u003eDevelopment of print-speech integration in the brain of beginning readers with varying reading skills.\u003c/em\u003e Frontiers in human neuroscience, 2020. \u003cstrong\u003e14\u003c/strong\u003e: p. 289.\u003c/li\u003e\n\u003cli\u003eI. Karipidis, I., et al., \u003cem\u003eNeural initialization of audiovisual integration in prereaders at varying risk for developmental dyslexia.\u003c/em\u003e Human Brain Mapping, 2017. \u003cstrong\u003e38\u003c/strong\u003e(2): p. 1038-1055.\u003c/li\u003e\n\u003cli\u003eHolloway, I.D., et al., \u003cem\u003eSemantic and perceptual processing of number symbols: evidence from a cross-linguistic fMRI adaptation study.\u003c/em\u003e Journal of cognitive neuroscience, 2013. \u003cstrong\u003e25\u003c/strong\u003e(3): p. 388-400.\u003c/li\u003e\n\u003cli\u003eHolloway, I.D., G.R. Price, and D. Ansari, \u003cem\u003eCommon and segregated neural pathways for the processing of symbolic and nonsymbolic numerical magnitude: An fMRI study.\u003c/em\u003e Neuroimage, 2010. \u003cstrong\u003e49\u003c/strong\u003e(1): p. 1006-1017.\u003c/li\u003e\n\u003cli\u003ePinel, P., et al., \u003cem\u003eModulation of parietal activation by semantic distance in a number comparison task.\u003c/em\u003e Neuroimage, 2001. \u003cstrong\u003e14\u003c/strong\u003e(5): p. 1013-1026.\u003c/li\u003e\n\u003cli\u003eGlasser, M.F., et al., \u003cem\u003eA multi-modal parcellation of human cerebral cortex.\u003c/em\u003e Nature, 2016. \u003cstrong\u003e536\u003c/strong\u003e(7615): p. 171-178.\u003c/li\u003e\n\u003cli\u003eEsteban, O., et al., \u003cem\u003efMRIPrep: a robust preprocessing pipeline for functional MRI.\u003c/em\u003e Nature methods, 2019. \u003cstrong\u003e16\u003c/strong\u003e(1): p. 111-116.\u003c/li\u003e\n\u003cli\u003eFischl, B., \u003cem\u003eFreeSurfer.\u003c/em\u003e Neuroimage, 2012. \u003cstrong\u003e62\u003c/strong\u003e(2): p. 774-781.\u003c/li\u003e\n\u003cli\u003eCharest, I., N. Kriegeskorte, and K.N. Kay, \u003cem\u003eGLMdenoise improves multivariate pattern analysis of fMRI data.\u003c/em\u003e NeuroImage, 2018. \u003cstrong\u003e183\u003c/strong\u003e: p. 606-616.\u003c/li\u003e\n\u003cli\u003eKay, K.N., et al., \u003cem\u003eGLMdenoise: a fast, automated technique for denoising task-based fMRI data.\u003c/em\u003e Frontiers in neuroscience, 2013. \u003cstrong\u003e7\u003c/strong\u003e: p. 247.\u003c/li\u003e\n\u003cli\u003eRaij, T., K. Uutela, and R. Hari, \u003cem\u003eAudiovisual integration of letters in the human brain.\u003c/em\u003e Neuron, 2000. \u003cstrong\u003e28\u003c/strong\u003e(2): p. 617-25.\u003c/li\u003e\n\u003cli\u003eRothlein, D. and B. Rapp, \u003cem\u003eThe similarity structure of distributed neural responses reveals the multiple representations of letters.\u003c/em\u003e NeuroImage, 2014. \u003cstrong\u003e89\u003c/strong\u003e: p. 331-44.\u003c/li\u003e\n\u003cli\u003eDehaene, S., \u003cem\u003eVarieties of numerical abilities.\u003c/em\u003e Cognition, 1992. \u003cstrong\u003e44\u003c/strong\u003e(1-2): p. 1-42.\u003c/li\u003e\n\u003cli\u003eDehaene, S. and L. Cohen, \u003cem\u003eTowards an anatomical and functional model of number processing.\u003c/em\u003e Mathematical cognition, 1995. \u003cstrong\u003e1\u003c/strong\u003e(1): p. 83-120.\u003c/li\u003e\n\u003cli\u003eCalvert, G.A., R. Campbell, and M.J. Brammer, \u003cem\u003eEvidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex.\u003c/em\u003e Current biology : CB, 2000. \u003cstrong\u003e10\u003c/strong\u003e(11): p. 649-57.\u003c/li\u003e\n\u003cli\u003eRomanovska, L., R. Janssen, and M. Bonte, \u003cem\u003eCortical responses to letters and ambiguous speech vary with reading skills in dyslexic and typically reading children.\u003c/em\u003e NeuroImage: Clinical, 2021. \u003cstrong\u003e30\u003c/strong\u003e: p. 102588.\u003c/li\u003e\n\u003cli\u003eBiceroglu, H. and A. Karadag, \u003cem\u003eNeuroanatomical aspects of the temporo-parieto-occipital junction and new surgical strategy to preserve the associated tracts in junctional lesion surgery: fiber separation technique.\u003c/em\u003e Turk Neurosurg, 2019. \u003cstrong\u003e29\u003c/strong\u003e(6): p. 864-874.\u003c/li\u003e\n\u003cli\u003eDe Benedictis, A., et al., \u003cem\u003eAnatomo\u003c/em\u003e\u003cem\u003e‐functional study of the temporo\u003c/em\u003e\u003cem\u003e‐parieto\u003c/em\u003e\u003cem\u003e‐occipital region: dissection, tractographic and brain mapping evidence from a neurosurgical perspective.\u003c/em\u003e Journal of anatomy, 2014. \u003cstrong\u003e225\u003c/strong\u003e(2): p. 132-151.\u003c/li\u003e\n\u003cli\u003eDeprez, S., et al., \u003cem\u003eThe functional neuroanatomy of multitasking: combining dual tasking with a short term memory task.\u003c/em\u003e Neuropsychologia, 2013. \u003cstrong\u003e51\u003c/strong\u003e(11): p. 2251-2260.\u003c/li\u003e\n\u003cli\u003eDuffau, H., M.T. De Schotten, and E. Mandonnet, \u003cem\u003eWhite matter functional connectivity as an additional landmark for dominant temporal lobectomy.\u003c/em\u003e Journal of Neurology, Neurosurgery \u0026amp; Psychiatry, 2008. \u003cstrong\u003e79\u003c/strong\u003e(5): p. 492-495.\u003c/li\u003e\n\u003cli\u003eDuffau, H., et al., \u003cem\u003eIntra-operative mapping of the subcortical visual pathways using direct electrical stimulations.\u003c/em\u003e Acta Neurochirurgica, 2004. \u003cstrong\u003e146\u003c/strong\u003e(3).\u003c/li\u003e\n\u003cli\u003eFehr, T., C. Code, and M. Herrmann, \u003cem\u003eCommon brain regions underlying different arithmetic operations as revealed by conjunct fMRI\u0026ndash;BOLD activation.\u003c/em\u003e Brain research, 2007. \u003cstrong\u003e1172\u003c/strong\u003e: p. 93-102.\u003c/li\u003e\n\u003cli\u003eRosenberg-Lee, M., et al., \u003cem\u003eFunctional dissociations between four basic arithmetic operations in the human posterior parietal cortex: a cytoarchitectonic mapping study.\u003c/em\u003e Neuropsychologia, 2011. \u003cstrong\u003e49\u003c/strong\u003e(9): p. 2592-2608.\u003c/li\u003e\n\u003cli\u003eSakurai, Y., M. Asami, and T. Mannen, \u003cem\u003eAlexia and agraphia with lesions of the angular and supramarginal gyri: evidence for the disruption of sequential processing.\u003c/em\u003e Journal of the neurological sciences, 2010. \u003cstrong\u003e288\u003c/strong\u003e(1-2): p. 25-33.\u003c/li\u003e\n\u003cli\u003eTavor, I., et al., \u003cem\u003eSeparate parts of occipito-temporal white matter fibers are associated with recognition of faces and places.\u003c/em\u003e Neuroimage, 2014. \u003cstrong\u003e86\u003c/strong\u003e: p. 123-130.\u003c/li\u003e\n\u003cli\u003eZhen, Z., H. Fang, and J. Liu, \u003cem\u003eThe hierarchical brain network for face recognition.\u003c/em\u003e PloS one, 2013. \u003cstrong\u003e8\u003c/strong\u003e(3): p. e59886.\u003c/li\u003e\n\u003cli\u003eOjemann, G.A., \u003cem\u003eThe neurobiology of language and verbal memory: observations from awake neurosurgery.\u003c/em\u003e International Journal of Psychophysiology, 2003. \u003cstrong\u003e48\u003c/strong\u003e(2): p. 141-146.\u003c/li\u003e\n\u003cli\u003eBlau, V., et al., \u003cem\u003eDeviant processing of letters and speech sounds as proximate cause of reading failure: a functional magnetic resonance imaging study of dyslexic children.\u003c/em\u003e Brain, 2010. \u003cstrong\u003e133\u003c/strong\u003e(3): p. 868-879.\u003c/li\u003e\n\u003cli\u003eSekiyama, K., et al., \u003cem\u003eAuditory-visual speech perception examined by fMRI and PET.\u003c/em\u003e Neuroscience Research, 2003. \u003cstrong\u003e47\u003c/strong\u003e(3): p. 277-287.\u003c/li\u003e\n\u003cli\u003eXu, W., et al., \u003cem\u003eRapid changes in brain activity during learning of grapheme-phoneme associations in adults. NeuroImage, 220, Article 117058\u003c/em\u003e. 2020.\u003c/li\u003e\n\u003cli\u003eXu, W., et al., \u003cem\u003eAudiovisual processing of Chinese characters elicits suppression and congruency effects in MEG.\u003c/em\u003e Frontiers in Human Neuroscience, 2019. \u003cstrong\u003e13\u003c/strong\u003e: p. 18.\u003c/li\u003e\n\u003cli\u003eRomanovska, L., R. Janssen, and M. Bonte, \u003cem\u003eLongitudinal changes in cortical responses to letter-speech sound stimuli in 8\u0026ndash;11 year-old children.\u003c/em\u003e npj Science of Learning, 2022. \u003cstrong\u003e7\u003c/strong\u003e(1): p. 2.\u003c/li\u003e\n\u003cli\u003eBonandrini, R., E. Gornetti, and E. Paulesu, \u003cem\u003eA meta-analytical account of the functional lateralization of the reading network.\u003c/em\u003e cortex, 2024. \u003cstrong\u003e177\u003c/strong\u003e: p. 363-384.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"audio-visual integration, MVPA, STG","lastPublishedDoi":"10.21203/rs.3.rs-8699439/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8699439/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eNumeracy and literacy are fundamental cognitive skills that rely on associating visual symbols with their spoken representations. Prior research has identified the posterior temporal-parietal cortex as a key neural region for the cross-modal transformation of these audio-visual alphanumeric symbols. However, the modality-dependent and cross-modal cortical activation patterns underlying these transformations remain unclear. In this slow-event-related 3T fMRI experiment, twenty-one participants were presented with auditory or visual letters and numbers while performing a passive listening/viewing task. We found overlapping activation across auditory cortical regions for auditory letters/numbers and across ventral visual regions for visual letters/numbers. In particular, activity in superior temporal cortical regions such as A5/A4/Parabelt exhibited high reliability for auditory stimuli, whereas activity in occipital and ventral temporal cortical regions such as V3/V4/PH demonstrated high reliability for visual stimuli. The temporo-parieto-occipital junction (TPOJ) showed overlapping responses with similar amplitudes for both auditory and visual stimuli. Despite this global similarity in responses, multivariate analysis revealed that the right TPOJ successfully differentiated between visual and auditory stimuli. Our findings reinforce the TPOJ’s role in the cross-modal processing of symbolic representations and may have implications for developmental learning difficulties such as dyslexia, where cross-modal integration may form a challenge for acquiring reading fluency.\u003c/p\u003e","manuscriptTitle":"Cross-modal processing of auditory and visual symbol representations in the temporo-parietal cortex","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-03 16:14:18","doi":"10.21203/rs.3.rs-8699439/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-04-09T10:37:00+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"319328026659477373255733545869447839041","date":"2026-02-12T09:21:37+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-02-09T15:47:29+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-02-09T15:39:10+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-02-04T04:58:58+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-02-02T11:04:56+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2026-02-02T10:47:08+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"7a43f645-2a4e-493f-b6be-0668de9c5d7b","owner":[],"postedDate":"February 3rd, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":62154710,"name":"Biological sciences/Neuroscience"},{"id":62154711,"name":"Biological sciences/Psychology"},{"id":62154712,"name":"Social science/Psychology"}],"tags":[],"updatedAt":"2026-02-09T15:54:06+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-03 16:14:18","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8699439","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8699439","identity":"rs-8699439","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00