Investigating neural speech processing with functional near infrared spectroscopy: considerations for temporal response functions

doi:10.64898/2026.03.20.713212

Investigating neural speech processing with functional near infrared spectroscopy: considerations for temporal response functions

2026 · doi:10.64898/2026.03.20.713212

preprint OA: closed CC-BY-NC-4.0

📄 Open PDF Full text JSON View at publisher

Full text 77,219 characters · extracted from oa-pdf · 4 sections · click to expand

Abstract

17 Functional near infrared spectroscopy (fNIRS) is increasingly used in hearing and 18 communication research, with advantages such as robustness to movement 19 artifacts, improved spatial resolution, and flexibility of contexts in which it can be 20 applied. At the same time, the field is progressively moving towards more 21 continuous, naturalistic listening paradigms resulting in the widespread adoption of 22 speech tracking analyses such as temporal response functions (TRFs) in 23 electroencephalography (EEG) and magnetoencephalography (MEG) studies. 24 However, it remains unclear whether these analyses can be applied to slower 25 haemodynamic signals measured by fNIRS. In the present study, we investigated 26 whether a TRF framework can similarly be applied to fNIRS data recorded during 27 continuous speech perception. Eight participants listened to speech simultaneously 28 while fNIRS signals were acquired in a hyperscanning setup. Speech features were 29 regressed onto the haemodynamic responses to test the feasibility and 30 interpretability of fNIRS-based TRFs. Prediction correlations between observed and 31 modelled fNIRS signals across speech features were higher than those typically 32 reported for EEG- and comparable to those reported for MEG-TRF studies. 33 Moreover, these correlations did not overlap with a null distribution generated from 34 trial/i1mismatched fNIRS data, confirming statistical significance and were slightly 35 greater than those obtained from a conventional GLM approach. Our findings 36 support that TRF estimation method can yield meaningful and statistically significant 37 responses from fNIRS data. 38

Keywords

39 Functional Near-Infrared Spectroscopy; Temporal Response Function; Continuous 40 Speech; Neural Tracking. 41 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint Highlights 42 • TRF modelling can be meaningfully applied to fNIRS data acquired during 43 speech listening tasks. 44 • Prediction correlations between actual and modelled fNIRS signals were above 45 chance level, with values comparable to previous EEG/MEG studies. 46 • TRFs explained more fNIRS varianc e than a conventional GLM approach. 47 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint 1. Introduction 48 Neurophysiology research on speech perception has traditionally relied on highly controlled 49 experimental paradigms that use short, isolated uni ts of speech such as syllables, words, or 50 brief sentences. While these approaches a llow a high degree of control and precise 51 alignment with event-related designs, they fail to capture important properties of natural 52 speech, including continuity, te mporal dynamics, and contextual complexity (Shen et al., 53 2024). More recently, however, there has been a shift towards using continuous, naturalistic 54 stimuli that better approximate real-life listeni ng situations, motivated by the opportunity to 55 investigate how social factors interact with sound and language processing (Di Liberto & Ip, 56 2025; Hu et al., 2025; Rowland et al., 2018; Sonkusare et al., 2019). While traditional 57 electrophysiological paradigms primarily rely on stimuli on the order of milliseconds (Rossi et 58 al., 2012), several neuroimaging studies have de monstrated the feasibility of using longer 59 speech segments (e.g., audio-books; podcast s) to study speech comprehension and 60 listening effort (Bálint et al., 2025; Bertachini et al., 2021; Levin et al., 2022; Pollonini et al., 61 2014; van de Rijt et al., 2016). 62 This shift toward continuous speech has been accompanied by the introduction of analytic 63 frameworks for modelling the relationship between continuous stimulus features and neural 64 responses. A widely adopted framework for estimating such relationships is the Temporal 65 Response Function (TRF), where encoding mode ls are derived using regularised ridge 66 regression (Crosse et al., 2016; Crosse et al ., 2021). TRFs have been extensively applied to 67 electroencephalography (EEG) and magnetoencephalography (MEG) data to quantify neural 68 tracking of speech and music, shedding light on the brain represents linguistic and acoustic 69 units (Brodbeck et al., 2018; Di Liberto et al., 2023; Menn et al., 2023), implements auditory 70 attention mechanisms (Ding & Simon, 2012; O' Sullivan et al., 2015), integrates audiovisual 71 cues (Crosse et al., 2015), and implements predictive processing (Di Liberto et al., 2018), 72 among other cognitive operations (Di Liberto et al., 2020). 73 EEG and MEG offer excellent temporal resolution and remain indispensable for studying the 74 millisecond-level dynamics of speech percepti on and language processing. However, like all 75 neuroimaging techniques, they come with modality specific trade-offs that can make certain 76 naturalistic paradigms more suitable than others. For instance, the low spatial resolution of 77 EEG limits precise source localisation, especially for deeper or distributed cortical 78 generators. MEG offers improved spatial resolution to EEG, but both modalities are sensitive 79 to movement-related artifacts, which can constrain experimental designs involving 80 interactive, mobile, or highly naturalistic behavio urs. This is especially true with MEG, where 81 magnetic shielding is critical, and so participation occurs in small shielded in rooms, with 82 certain MEG systems also requiring participants to be head fixed. For these reasons, 83 considering other neuroimaging methods with bette r experimental flexibility is important to 84 broaden the range of paradigms that can be meaningfully investigated. 85 Functional near-infrared spectroscopy (fNIRS) is an optical neuroimaging technique that 86 indirectly measures brain activity (Czeszumski et al., 2020; Eulau & Hirsh-Pasek, 2024; 87 Quaresima et al., 2019). It relies on the comparison of haemoglobin oxygenation 88 concentration s in nervous tissues, offering blood flow measurements (Eulau & Hirsh-Pasek, 89 2024; Pinti et al., 2020). Due to its non-invasive nature and robustness to movement 90 artifacts, it has been used in substitution or combination with EEG, reaching cortical areas 91 up to 1.5 to 2 cm in depth (Czeszumski et al., 2020; Eulau & Hirsh-Pasek, 2024; Pinti et al., 92 2020). Additionally, due to its compatibility with electronic and magnetic devices, fNIRS has 93 also been widely used in research focused on hearing and speech perception, especially 94 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint with hearing aid and cochlear implant users (A nderson et al., 2017; Bell et al., 2020; Sevy et 95 al., 2010). 96 fNIRS has been successfully employed to study th e speech processing hierarchy in infants 97 and adults (Rossi et al., 2012), including phone mic contrasts to syllables and words 98 (Minagawa-Kawai et al., 2008), sentence-level processing (Peña et al., 2003), and 99 challenging listening conditions such as vocoded speech (Lawrence et al., 2018). However, 100 these studies typically rely on comparing the condition of interest with a baseline condition 101 using general linear models (GLM). This reliance stems from the slow and overlapping 102 nature of the haemodynamic response, which peak several seconds after stimulus onset, 103 making it difficult to isolate neural responses without clearly separated blocks. Such block 104 designs place constraints on experimental flexibility, limiting the study of natural speech 105 processing. 106 Here, we examine whether a TRF framework can be meaningfully applied to fNIRS data 107 acquired during a continuous speech listeni ng task. The fNIRS experiment was conducted 108 as part of the 1 st Cognition and Natural Sensory Processing (CNSP) hackathon, a satellite 109 event to the 8 th International Conference on Audito ry Cortex. In a single hyperscanning 110 session using a classroom-style design, fNIRS data were recorded simultaneously from 111 eight participants as they listened to podcast dialogues presented through a loudspeaker. 112 The stimuli were identical to those used in a previous EEG study on podcast dialogue 113 listening (Ip et al., 2025), providing clear expec tations for the temporal response patterns. 114 Encoding TRF and decoding analyses were used to assess the extent to which fNIRS can 115 capture speech-related neural proc essing across multiple levels of the processing hierarchy. 116 Although this study was conducted in a hyperscanning context, this was not the primary 117 focus of our analyses. Rather than targeting in tersubject synchronisation, we aim to bridge 118 TRF analyses with the increasing use of fNIRS in hearing and communication research, and 119 to evaluate if fNIRS can support continuous, naturalistic paradigms using analysis methods 120 designed for electrophysiology. 121 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint 2. Methods 122 2.1 Participants 123 Eight volunteer Hackathon attendees (four female, four male) between 25 and 43 years of 124 age ( M = 30.63) took part in the experiment. Five of the eight participants were native 125 English speakers and the remaining three were highly proficient in English. The study was 126 approved by the School of Computer Science and Statistics Research Ethics Committee at 127 the University of Dublin, Trinity College, and all participants provided written informed 128 consent. Data collection took place in a workroom with natural lighting in the office building 129 where Hackathon activities were carried out in Maastricht, the Netherlands. 130 2.2 Stimuli and Task 131 A previous experiment on non-simultaneous EEG recordings was adapted for the 132 hyperscanning fNIRS setting (Ip et al., 2025), foll owing a similar presentation of stimuli. 133 Participants sat together in front of a screen wher e written instructions and a visual fixation 134 cross were displayed. Audio stimuli were deliv ered by loudspeakers at the front of the room 135 and presented at a sampling rate of 44,100 Hz. Stimuli were presented and controlled by the 136 PsychoPy Python library version 2025.1.1 (Peirce et al., 2019). The audio samples consisted 137 of 28 dialogues taken from four sources: two American podcasts ( Brains On! and Forever 138 Ago), a YouTube series by WIRED (5 Levels of Difficulty), and interviews with the hosts from 139 the podcast Forever Ago . All podcasts and shows are publicly available online 1. The 140 dialogues discussed general interest topics related to science, arts, and childhood 141 memories. All audios were one-on-one dialogues fr om either adult-adult (10 trials) or adult-142 child (18 trials) interactions, totalling appr oximately one hour. The whole experiment was run 143 in a single session, as data from all participants were acquired simultaneously. 144 2.3 fNIRS Data Acquisition 145 fNIRS data was recorded with Brite Ultra System (Artinis Medical Systems B.V., Elst, The 146 Netherlands), a system specifically designed for hyperscanning recordings. The fNIRS 147 devices were wireless (model Brite MKII), each including 18 optodes (10 transmitters and 8 148 receivers) leading to 16 long-separation channels and two short-separation channels (see 149 Table 1). Optodes were mounted in caps, covering frontal and temporal regions. The 150 montage of the optodes and placement of the caps was carried out by the volunteers, with 151 the assistance of Artinis’ fNIRS experts as part of a hands-on fNIRS workshop, with step-by-152 step supervision. Data were recorded using Brite Connect Ultra, an ad-hoc software platform 153 designed for large-scale fNIRS hyperscanning, consisting of a principal recording hub and 154 one tablet interface. This allowed two researchers (one coordinator and one assistant) to 155 supervise the multiple simultaneous recordings. Data were recorded at a sampling frequency 156 of 100 Hz for all participants. Event triggers were sent manually via button presses using 157 PortaSync (a wireless remote device compatible with Artinis systems) to enable time-series 158 synchronisation with the stimulus presentation by PsychoPy, which were delivered to a 159 computer not integrated to the Brite system. 160 Table 1. Optode Information 161 1 https://laist.com/podcasts/servant-of-pod/kids-podcasts-a-true-alternative-to-screen-time; https://creators.spotify.com/pod/profile/martine-severin/episodes/31--How-to-Become-an-Epic- ImprovComic-with-Joy-Dolo-e1js9c2 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint Channel Transmitter Receiver Position Channel Length (mm) 1 1 1 Right frontal 25.76 2 1 2 Right frontal 31.62 3 2 1 Short separation 0 4 3 1 Right frontal 34.43 5 3 2 Right frontal 24.72 6 4 3 Right temporal 31.33 7 4 4 Right temporal 28.37 8 5 3 Right temporal 31.24 9 5 4 Right temporal 34.75 10 6 5 Left frontal 25.88 11 6 6 Left frontal 31.64 12 7 5 Short separation 0 13 8 5 Left frontal 34.46 14 8 6 Left frontal 24.93 15 9 7 Left temporal 31.39 16 9 8 Left temporal 28.39 17 10 7 Left temporal 31.15 18 10 8 Left temporal 34.73 162 2.4 Experimental procedure 163 The 28 audio trials were presented in a pr e-determined random order, each followed by a 164 brief break for participants, where they rated their interest in the dialogue on a scale from 1 165 to 5 (1 = not at all interesting; 3 = neutral; 5 = very interesting). The breaks had variable 166 length and were controlled by the researchers to ensure that all participants had time to 167 respond before proceeding to the next trial. On every second trial, a behavioural multiple-168 choice question was presented to maintain participants’ attention and engagement to the 169 task. The question was to select descriptive ke ywords that best described the speech in the 170 trial. These behavioural responses acted as measures of attention and engagement during 171 the task but were not used in further analyses. 172 2.5 fNIRS Preprocessing and TRF Analysis 173 The raw fNIRS data were stored in shared near infrared spectroscopy format (SNIRF) 174 (Tucker et al., 2022). Preprocessing was conduc ted in MATLAB using inspiration from the 175 open source fNIRS preprocessing pipelines for the NIRx system 2 and for the Artinis system 176 using SNIRF 3. Our pipeline requires three publicly available toolboxes: inpaint_nans 177 (D'Errico, 2026), Homer3 (Huppert et al., 2009) and NIRS (Santosa et al., 2018). The 178 following preprocessing steps were applied: (1) early pruning based on raw intensity levels 179 to identify missing values and signal dropouts, (2 ) conversion of light intensity to optical 180 density (OD), (3) band-pass filtering between 0.01 and 0.7 Hz was applied because 181 stimulus-related activity is not expected below 0.01 Hz, while haemodynamic responses 182 typically occur below 0.5 Hz. The upper cutoff also helps attenuate cardiac signals, which 183 2 https://github.com/smburns47/preprocessingfNIRS 3 https://github.com/Artinis-Medical-Systems-B-V/snirf_data_example .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint usually occur around 1 Hz, (4) inspection and removal of potential bad channels (none were 184 identified), and (5) exclusion of the two shor t-separation channels. Subsequently, each trial 185 was segmented and aligned with the corresponding speech stimulus, downsampled to 25 186 Hz, and saved in the Continuous-event Neural Data (CND) data structure (Di Liberto et al., 187 2024). 188 We extracted five stimulus features to use in our analyses including the acoustic envelope, 189 half-way rectified envelope derivative, word onset times, and surprisal and entropy from 190 GPT-2 (see (Ip et al., 2025) for detailed descr iptions of feature extraction). Temporal 191 response functions (TRFs) were used to es timate the linear mapping between a given 192 sensory stimulus feature and the corresponding neural responses (Ding et al., 2014; Lalor, 193 2009). TRFs were computed using the mTRF-Toolbox (Crosse et al., 2016) and custom 194 code built starting from scripts from the CNSP open science initiative 4. TRFs estimate the 195 relationship between the stimulus features and fNIRS signals as a linear time-invariant 196 system via a lagged Ridge regression procedur e. TRF models were fit for each participant 197 separately, using a leave-one-out cross-validat ion procedure across trials to control for 198 overfitting, and Ridge regularisati on to prevent overfitting. The regularisation parameter ( λ ) 199 was optimised through an exhaustive search over a logarithmic range from 0.00001 to 200 1000000 within each training fold. For forward models, the TRF estimates a temporal filter 201 for each fNIRS channel, capturing how the neural response at a given time point can be 202 predicted from the preceding stimulus. The optimal λ for forward models was defined as the 203 value yielding the highest prediction correlation (Pearson’s r) between the predicted and 204 observed fNIRS signals (Crosse et al., 2016). Backward models were fit to reconstruct 205 stimulus features from the fNIRS signals, where data from all channels is considered 206 simultaneously. The optimal value of λ for backward models was defined as the value 207 yielding the highest Pearson’s correlation between the reconstructed and actual stimulus 208 feature. 209 Since fNIRS measures haemodynamic responses evolve much slower than signals 210 measured by EEG or MEG, several adjustments were made. First, trials were segmented by 211 including additional 30 seconds of fNIRS data after the end of the sound stimulus, ensuring 212 that speech-related haemodynamic responses are fully included in the trials. A technical 213 requirement for doing so was to zero-pad the stimulus features to match the fNIRS 214 segments. Second, we selected TRF time-lags ranging from 0 to 30 seconds to capture the 215 slow haemodynamic response; much longer than those usually used in EEG and MEG 216 studies. To explore the possibility of improving computational efficiency for the TRF model fit, 217 we tested the TRF analysis for several downsampling frequencies, ranging from 1 Hz to 25 218 Hz, determining the minimum sampling frequency possible without information loss (Figure 219 1B). Finally, since fNIRS measures change s in oxygenated and deoxygenated haemoglobin 220 concentrations (HbO and HbR respectively), s eparate models for fit for each metric. Due to 221 our small sample size, all statistical comparis ons were done across trials rather than across 222 participants. 223 To test if our TRF models were explaining meaningful variance in the fNIRS response, we 224 compared our prediction correlation values to a null distribution. For each feature and for a 225 full multivariate model, null distributions were created by mismatching the stimulus features 226 and fNIRS data across trials, using pairs separat ed by at least five trials to minimise 227 temporal autocorrelation in the slow fNIRS signals. This was repeated 100 times per subject. 228 Prediction correlations were computed for each iteration and averaged across channels and 229 4 https://github.com/CNSP-Workshop/CNSP-resources .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint trials, resulting in 800 null correlation values (100 per subject). Our observed mean 230 prediction correlations were then compared to the null distribution for the corresponding 231 model. 232 2.6 General Linear Model analysis 233 To provide a benchmark for the TRF approac h, we implemented a standard general linear 234 model (GLM) analysis using a cross-validated framework. For each participant, a design 235 matrix was constructed by convolving the ex tracted stimulus features (i.e., acoustic 236 envelope, envelope derivative, word onsets, lexical surprisal, and entropy) with a canonical 237 haemodynamic response function (HRF). The HRF was generated using standard SPM12 238 parameters, including a peak delay of 6 s and an undershoot delay of 16 s, and was 239 normalised to a peak amplitude of one. Following convolution, all regressors were z-scored 240 to normalise their scales and ensure the stability of the subsequent regression. The model 241 also incorporated nuisance regressors to account for non-neural signal components and low-242 frequency fluctuations. These included trial-specific intercepts to capture baseline offsets for 243 each speech segment and linear drift terms to model slow signal trends within each trial 244 duration. Prior to the fitting procedure, fNIR S data were pre-processed by applying a linear 245 detrend operation ( detrend function in MATLAB) and z-score transformation to each 246 individual trial. 247 Predictive performance was assessed using a 10-fold cross-validation procedure. At each 248 iteration, nine segments were used to estimate the model weights via ordinary least squares 249 (OLS) regression. The resulting weights were then applied to the design matrix of the 250 remaining held-out segment to generate a predict ed fNIRS time-series. This process was 251 repeated for each fold, and model accuracy wa s quantified by calculating the Pearson’s 252 correlation coefficient (r) between the predicted and ground-truth signals, for each of the 16 253 channels. These correlations were calc ulated separately for oxygenated (HbO), 254 deoxygenated (HbR), and total haemoglobin (HbT) metrics to allow for a comprehensive 255 comparison of model predictive performance acro ss different physiological signals. By using 256 cross-validation and Pearson’s correlations, we ensured that the GLM results were directly 257 comparable to the TRF metrics. 258 3. Results 259 Participants rated the passages to be of neutral interest on average ( M = 3.04), with the 260 adult-child trials being rated as slightly more enjoyable ( M = 3.20) than the adult-adult trials 261 (M = 2.76). Accuracy on the comprehension questions was high (range: 76.9% - 100%, M = 262 89.4%), with accuracy being slightly higher for the adult-child trials ( M = 91.7%) than the 263 adult-adult trials (M = 87.5%). 264 To determine model parameters for subsequent analyses, we first explored the impact of the 265 downsampling frequency on the performance of forward and backward models (Figure 1). 266 We ran both backward and forward TRF models on the acoustic envelope for both HbO and 267 HbR (Figure 1A). While backward models are often reported with higher correlation scores 268 than forward models in EEG and MEG TRF analyses (Crosse et al., 2016; Haufe et al., 269 2014), we observed similar median correlations and no clear differences between modelling 270 direction in fNIRS. Therefore, we proceeded with the forward model for subsequent 271 analyses, primarily due to its more direct interpretability (Haufe et al., 2014). Forward models 272 provide TRF weights that characterise how different features contribute to the fNIRS signal. 273 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint Because fNIRS measures slow haemodynam ic responses, lower analysis sampling 274 frequencies than those used in EEG or MEG should be sufficient. While lower sampling 275 rates reduce computational cost, overly lo w frequencies may fail to capture relevant 276 response dynamics or, similarly, cause the loss of critical stimulus information. We tested six 277 different analysis sampling frequencies from 1Hz to 25Hz to determine the minimum 278 sampling frequency without information loss (Figure 1B). Performance was reduced at 1 and 279 5 Hz, whereas correlations appeared to stabilise at 10 Hz for both HbO and HbR. For HbO, 280 a repeated measures ANOVA with Greenhouse-Geisser correction revealed a significant 281 effect of sampling frequency ( F(1.24, 33.42) = 32.22, p < .001). We then ran follow up 282 pairwise t-tests (false discovery rate [FDR] corrected) on adjacent sampling frequencies 283 (e.g., 1 to 5, 5 to 10, etc.). We found that the mean prediction correlation across trials was 284 significant greater for 5 Hz than 1 Hz (t(27) = 4.77, p < .001) and for 10 Hz than 5 Hz (t(27) = 285 5.72, p < .001). There were no significant differences between 15 Hz and 10 Hz ( t(27) = 286 0.35, p = .782, 20 Hz and 15 Hz t (27) = 1.63, p = .132, or 25 Hz and 20 Hz t(27) = 0.25, p < 287 .808. Similarly for HbR, we found a significant effect of sampling frequency ( F(1.32, 35.58) = 288 21.18, p < .001; Greenhouse-Geisser corrected). We then ran follow up pairwise t-tests 289 (FDR corrected) on adjacent sampling frequencies (e.g., 1 to 5, 5 to 10, etc.). We found that 290 the mean prediction correlation across trials was significant greater for 5 Hz than 1 Hz ( t(27) 291 = 3.59, p = .002) and for 10 Hz than 5 Hz ( t(27) = 4.63, p < .001). There were no significant 292 differences between 15 Hz and 10 Hz ( t(27) = 0.60, p = .639, 20 Hz and 15 Hz t(27) = 0.40, 293 p = .689, or 25 Hz and 20 Hz t (27) = 0.53, p < .647). Based on this trade-off between model 294 performance and computational efficiency, subsequent analyses were conducted using a 295 sampling frequency of 10 Hz. All subsequent results are presented for the forward model 296 with a sampling frequency of 10 Hz, unless stated otherwise. 297 298 Figure 1. C omparison of TRF model parameters. (A) Backward and forward univariate 299 TRF model performance. Red colour denotes TRF results for HbO signals, and blue denotes 300 HbR. The y-axis refers to the Pearson’s correlation metric (envelope reconstruction and 301 fNIRS prediction for backward and forward models respectively). (B) Impact of sampling 302 frequency on the forward TRF model performance. 303 3.1 fNIRS Tracking of the Acoustic Envelope 304 Univariate forward TRFs relating sound env elope end fNIRS time-series were examined. 305 HbO values yielded prediction correlations that were 10.3% higher than HbR on average 306 (Figure 2A). Here, we also computed prediction correlations for the sum of HbO and HbR 307 (referred to as HbT) and found comparable values to HbO (2.1% greater for HbT than HbO). 308 As such, the analyses that follow are carried out on HbO and HbR separately. When the 309 prediction correlations were broken down by channel, we can see that frontal channels tend 310 to yield higher values than temporal channels for HbO (Figure 2C) but not for HbR (Figure 311 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint 2D). We averaged the prediction correlation across the four channels in each group of 312 optodes (left frontal, left temporal, right front al, right temporal) for both HbO and HbR. For 313 HbO, a repeated measures ANOVA with Greenhouse Geisser correction revealed a 314 significant effect of channel location ( F(1.76, 47.41) = 9.269, p < .001). Follow up paired t-315 tests (FDR corrected) showed that the left frontal group yielded significantly higher prediction 316 correlations than the left temporal ( t(27) = 3.97, p = .002), right frontal ( t(27) = 2.36, p = 317 .031), and right temporal groups ( t(27) = 3.77, p = .002). The right frontal group led to larger 318 prediction correlations than either temporal group (left: t(27) = 2.62, p = .029; right: t (27) = 319 2.49, p = .029), and there was no significant diff erence between the two temporal groups 320 (t(27) = -0.26, p = .799). For HbR, a repeated measures ANOVA once again revealed a 321 significant effect of channel location ( F(3, 81) = 3.585, p = .017). Here, the only significant 322 pairwise difference was between the le ft frontal and right frontal groups ( t(27) = 3.14, p = 323 .024; FDR corrected), with the prediction correlations for the left frontal channels being 324 higher on average. 325 We then examined the TRF model weights for both metrics (Figure 2B). Compatibly with the 326 properties of haemodynamic responses in the brain, which are known to peak between 3 327 and 6 seconds after local neural activity increases (Miezin et al., 2000), we expected TRFs 328 to peak at latencies in that interval. Visually, the most prominent TRF components emerged 329 at latencies of 3–5 seconds. Dominant components at those latencies emerged at seven out 330 of 16 channels for HbO, and at one left frontal channel for HbR. 331 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint 332 Figure 2. Results from forward models using the acoustic envelope speech feature at 333 a sampling frequency of 10 Hz. (A) Boxplots of mean fNIRS prediction correlation values 334 averaged across trials, channels and participants for HbO (red), HbR (blue) and HbT (grey). 335 Each marker represents individual performance. (B) Magnitude of TRF weights for HbO 336 (red) and HbR (blue). (C-D) Channel-wise fNIRS prediction correlation values, colour-coded 337 by cortical region, for HbO (C) and HbR (D). (E-F) Channel-wise fNIRS prediction correlation 338 values mapped onto MNI space, coloured by correlation magnitude for HbO (E) and HbR 339 (F). 340 3.2 Univariate versus Multivariate TRFs 341 We next investigated how different stimulus featur es contribute to the TRF model. We fit five 342 univariate models, one for each feature (envelope, envelope derivative, word onsets, lexical 343 surprisal, and entropy), as well as a multivariate model including all five features. All TRFs 344 led to average fNIRS prediction correlations values within 0.1 and 0.2 (Figure 3A). The 345 univariate envelope TRF and the multivariate TRF led to the highest average prediction 346 correlation values for both HbO (envelope: M = 0.188; multivariate: M = 0.181) and HbR 347 (envelope: M = 0.170; multivariate: M = 0.167), while the univariate envelope derivative 348 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint model resulted in the lowest prediction correlation values for both signals (HbO: M = 0.131; 349 HbR: M = 0.122). Pairwise t-tests (FDR corrected) across all the models revealed several 350 differences in prediction correlations. Specific ally, for HbO, the multivariate model and the 351 univariate envelope model yielded significantly higher prediction correlations than all 352 univariate models (all p < .05). There was no significant difference between these two 353 models ( t(27) = 1.42, p = .167). The univariate derivative model resulted in significantly 354 lower prediction correlations than all models ex cept the univariate surprisal model (all p < 355 .05; derivative versus surprisal: t(27) = -1.67, p = .115). For HbR, there were no significant 356 differences between the multivariate model, the univariate envelope model, and the 357 univariate word onset model (all p > .05). All three of these models yielded significantly 358 higher prediction correlations than the univariate derivative model and the univariate 359 surprisal model (all p < .05). The multivariate model and the univariate envelope model were 360 significantly higher than the entropy model (all p < .05), but there was no significant 361 difference between the univariate word onset model and the univariate entropy model ( t(27) 362 = 1.89, p = .102). 363 TRF weights from the multivariate model averaged across channels showed that all five 364 speech features exhibited similar temporal profiles (Figure 3B). Based on visual inspection, 365 for HbO the responses for all features appeared to display a broader peak between 3–5 s, 366 whereas HbR showed a more pronounced peak around 5 s, again consistent with the 367 haemodynamic response. 368 Further analyses were conducted to measur e the unique contribution of each stimulus 369 feature to the multivariate TRF (Figure 3C). That was quantified as the fNIRS prediction 370 correlation loss after removing one stimulus feature at a time at the model training stage 371 (i.e., feature importance). We found envelope was the largest contributor to the TRF models 372 for both HbO and HbR. We ran paired t-tests (FDR corrected) comparing the full multivariate 373 model to reduced models in which each feature was omitted individually. For both HbO and 374 HbR, the only feature that resulted in significantly lower prediction correlations when 375 dropped was the envelope (HbO: t(27) = 3.44, p = .01; HbR: t(27) = 2.94, p = .033).Together 376 with the results from the univariate models, this shows that the envelope appears to be the 377 most important feature in our TRF models. 378 The laterality of the univariate models was computed as the difference between prediction 379 correlations from left- and right-hemisphere c hannels for each of speech features for both 380 metrics (Figure 3D). We ran pairwise t-tests (FDR corrected) between the mean of the left 381 hemisphere channels and the mean of the right hemisphere channels for each model. 382 Although the mean difference between the left and right was positive for all features, none 383 showed significant laterality (all p > .05). 384 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint 385 386 Figure 3. Univariate and multivariate forward TRF models at a sampling frequency of 387 10Hz. (A) Boxplots of mean prediction correlation values averaged across trials, channels, 388 and participants for HbO (red) and HbR (blue), shown for five speech features (envelope, 389 derivative, entropy, surprisal, word onset) as well as for a multivariate model including all 390 features. (B) TRF weights averaged across trials, channels, and participants for each 391 speech feature, shown separately for HbO (top) and HbR (bottom). (C) Average correlation 392 gain of the full multivariate model (all five features) relative to multivariate models with one 393 feature omitted. (D) Laterality analysis computed as the difference between fNIRS prediction 394 correlations from left-hemisphere channels and right-hemisphere channels. 395 3.4 Is it worth using TRFs to analyse fNIRS Data? 396 To assess whether the TRF performance exceeded chance level, we compared our obtained 397 fNIRS prediction correlations to a null distribution of 800 datapoints created by using 398 mismatched trial data (Figure 4A, B). TRF model s resulted in significantly higher prediction 399 correlations for all features than the null distribution (including for the multivariate model 400 including all five features) for both HbO and HbR. For HbO, the prediction correlations we 401 obtained from the univariate envelope, entropy, surprisal and word onset models were 402 beyond the 99 th percentile of the null distribution ( p < .01) and for the univariate derivative 403 and multivariate model, our obtained values were beyond the 95 th percentile ( p < .05). For 404 HbR, all six models resulted in prediction correlations beyond the 99 th percentile of the null 405 distribution (p < .01). This means that the TRF is c apturing meaningful variance in response 406 to these stimulus features for all models and both haemodynamic metrics. 407 Next, we examined the performance of a GLM approach with cross-validation. When fitting 408 the GLM with all five stimulus features convolved with a canonical HRF, the model 409 successfully predicted the fNIRS signals above chance, yielding positive median correlation 410 (r) values ranging between 0.12 and 0.13 across HbO and HbR metrics. The GLM was fitted 411 with all five stimulus features included simultaneously. The resulting beta weights reflect the 412 magnitude of the canonical HRF response to each feature and are conceptually comparable 413 to the peak amplitudes of the corresponding TRFs, though unlike the TRF, the GLM does not 414 estimate the shape of the haemodynamic response from the data. While the GLM provided a 415 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint reliable linear fit, the overall predictive correla tions were visibly lower than those achieved by 416 the multivariate TRF models, which reached medi an correlations closer to 0.18 for both HbO 417 and HbR. It is noteworthy that model performance was evaluated using cross-validation 418 rather than a trial-by-trial approach; although the latter was explored, it yielded unstable 419 predictions that did not generalize across trials, while cross-validation provides a robust and 420 unbiased estimate of predictive performance. For this reason, statistical comparisons 421 between the two are done at the subject level (n = 8). We used pairwise t-tests to see if 422 there was a significant difference in prediction correlations between the multivariate TRF 423 model and the GLM. The TRF resulted in significantly higher prediction correlations for both 424 HbO (t(7) = 3.40, p = .011) and HbR (t(7) = 2.49, p = .041). 425 Analysis of the average GLM beta weights provided further insight into how individual 426 stimulus features drove the predicted haem odynamic response within this traditional 427 framework. Because all five regressors were included simultaneously in the model, they 428 inherently competed to explain the variance in the continuous fNIRS signal. Within this 429 competitive multivariate space, the acoustic envelope exhibited the most prominent positive 430 beta weights, indicating it was the strongest unique driver of the modelled response. 431 Conversely, the envelope derivative yielded primarily negative beta weights, and the discrete 432 linguistic features (word onset, surprisal, and entropy) displayed weights heavily centred 433 around zero. This suppression of non-envelope feat ures in the GLM aligns perfectly with our 434 TRF multivariate gain analysis. In the TRF models, acoustic envelope resulted in the highest 435 prediction correlation, whereas adding the deriv ative and linguistic features yielded near-436 zero multivariate gain. 437 438 Figure 4. Comparison of TRF Models to Null Distributions and GLM. (A) Distribution of 439 prediction correlations for the multivariate TRF model using mismatched trial data for the 440 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint HbO metric. The mean prediction correlation for the actual TRF model is shown in red. (B) 441 The same for the HbR metric. The mean predicti on correlation for the actual TRF model is 442 shown in blue. (C) Comparison of prediction correlations for the TRF (purple) and GLM 443 (green) analyses for HbO and HbR. (D) Beta weights for each feature from the GLM analysis 444 for HbO and HbR. 445 4. Discussion 446 The main finding of this study is that the TRF estimation method can yield meaningful and 447 statistically significant temporal responses from fNIRS data. The estimated TRF weights 448 present temporal dynamics that are compatible with the canonical haemodynamic response, 449 with responses peaking at latencies between 3 and 6 seconds. These temporal profiles were 450 consistent across stimulus features and evaluat ion metrics. fNIRS prediction correlations 451 between actual and modelled signals were significantly above chance, with average 452 Pearson’s r values between 0.1 and 0.2 across all speech features for both HbO and HbR, 453 higher than values typically reported in EEG and comparable to those reported in MEG TRF 454 studies. Prediction performance was also higher than that obtained using a comparable GLM 455 approach. 456 A sampling frequency of 10 Hz was identified as the optimal choice for maximising 457 computational efficiency. Instead, lower sampling frequencies led to information loss. 458 Additionally, HbO and HbR haemoglobin metrics resulted in fNIRS prediction correlations 459 with comparable performance, with the HbO consistently approximately 10% higher. When 460 considering fNIRS channels separately, predict ions are higher in frontal regions than 461 temporal regions for both metrics. 462 4.1 Forward vs Backward TRF modelling 463 Backward (decoding) TRF models typically yield higher correlation values than forward 464 (encoding) TRF models in neurophysiology. One key reason is that forward models evaluate 465 the model predictions on noisy neural reco rdings, as the ground-truth neural signal is 466 unknown. Backward models, instead, perform that evaluation (i.e., Pearson’s correlation) on 467 the (clean) stimulus space, where the ground-tr uth is known. Another difference relates to 468 the multivariate-to-univariate nature of the TRF mapping. Backward models consider all 469 neural channels simultaneously, increasing the information used for the speech 470 reconstruction compared to using a single channel. Conversely, forward models predict one 471 channel at a time, but can combine multiple features simultaneously, which can increase the 472 model explanatory power when features are relat ed to complementary neural variance. With 473 these considerations in mind, backward models are generally expected to produce prediction 474 correlations that are higher numerically, whereas forward models would generally yield lower 475 correlation values, but provide greater physiological interpretability (Crosse et al., 2016; 476 Haufe et al., 2014; Wong et al., 2018). 477 In contrast with that literature, our results with fNIRS indicate comparable backward and 478 forward TRF correlations (Figure 1A). One possibility is that fNIRS signals have a higher 479 signal-to-noise (SNR), as higher SNRs increase the prediction correlation of forward models. 480 Another possibility is that speech-related fNIR S responses are spatially localised, meaning 481 that combining multiple channels in backward models would not increase the correlation 482 scores. The slow characteristic of haemodynamic signals also likely plays a role in this 483 result. In fact, such slow dynamics may be predictable using the speech envelope, while the 484 faster envelope dynamics may be too rapid for being reconstructed with such a slow brain 485 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint signal. It should also be noted that the baseline models in Figure 4A, B have mean above 486 zero, reflecting potential similarities across trials regardless of the specific speech segment 487 (e.g., auditory onset response). This further highlights the importance of considering a 488 baseline model (e.g., via trial mismatching) when assessing statistical significance. 489 While the slow dynamics of fNIRS may be seen as a disadvantage for the reasons above, 490 one positive implication is computational, as the time-series can be downsampled. Here, we 491 found that the TRF analysis remains unaltered for downsampling rates down to 10 Hz, while 492 information is lost at lower frequencies. 493 4.2 Stimulus Feature Analysis 494 The acoustic envelope yielded the highest fN IRS prediction correlations and contributed 495 most strongly to the multivariate models for both HbO and HbR (Figure 3). This was 496 expected as it is typically the case for EE G and MEG as well. An additional contributor to 497 that strong relationship may be that the envelope is the stimulus property in our feature set 498 with the closest rates to the fNIRS signal, possibly leading to a stronger alignment. In fact, 499 features with fast dynamics, such as the half-way rectified envelope derivative, which have 500 shown to be important in EEG TRF studies (B rodbeck et al., 2018; Chalas et al., 2022) had 501 little to no contribution to the multivariate TRF. 502 The TRF weights did not exhibit particular variation across features (Figure 3B). The 503 temporal dynamics for all five features resemble the canonical haemodynamic response 504 function, with peaks between 3 and 6 seconds for both HbO and HbR. This similarity across 505 features may explain why the multivariate gain was quite small. The implication is that, as 506 expected from fNIRS, the spatial maps for the TRFs may be more informative than their 507 temporal dynamics. This was seen in the univariate envelope model, with significant 508 differences in the prediction correlations acro ss different channels and visually different 509 behaviour in the weights. 510 4.3 Comparison of the TRF to null distributions and GLM approaches 511 Adoption of an appropriate baseline model or null distribution is critical for evaluating 512 encoding models, such as the TRF. Here, we used trial-mismatch to ensure disruption of the 513 temporal information in the fNIRS signal. We have also explored other baselines that are 514 common for EEG and MEG. For example, time-based shuffling methods implementing 515 within-trial permutation or circ ular shifts failed to disrupt the predictive power of the TRF 516 model, yielding surprisingly similar correlation va lues as the true fNIRS-stimulus trials, most 517 likely due to the slow nature of the neural haemodynamic response measured by fNIRS. 518 Both within-trial permutation and circular shift pr eserve aspects of the original time series 519 that are statistically meaningful for the haemodynamic response, preserving the overall 520 distribution and amplitude of the stimulus feature, with circular shift additionally preserving 521 the autocorrelation structure. Since fN IRS signals are dominated by slow, highly 522 autocorrelated haemodynamics, these shuffling methods can yield null model prediction 523 correlations that are very close to those obtained with the actual stimulus features 524 (Lancaster et al., 2018; Liégeois et al., 2021; Santosa et al., 2018). As such, we recommend 525 considering building null distribution by combining trial-mismatch with other operations, such 526 as time-reversal and circular shift, disrupting the stimulus-response coupling. Interestingly, 527 the null distribution does not centre perfectly on zero. fNIRS signals at the low frequencies 528 we included (i.e. 0.01 Hz) remain temporally smooth and strongly autocorrelated due to the 529 sluggish haemodynamic response. Since consecutive samples are not independent and low-530 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint frequency components dominate the variance, correlation estimates from fNIRS signals can 531 exhibit a non-zero baseline correlations (A fyouni et al., 2019; Arbabshirani et al., 2014; 532 Olszowy et al., 2019). 533 Prediction correlations were higher for TRFs than the GLM analysis. This difference in 534 predictive power reflects the increased degrees of freedom inherent in the TRF method. By 535 modelling temporal dynamics across a range of lags, TRFs can capture complex variance in 536 the continuous haemodynamic response that a static, canonical HRF-based GLM may 537 underfit. Just as with our TRF analysis, the acoustic envelope was the strongest contributor 538 to the overall model. The beta weights for a ll other features were near zero, with the 539 envelope derivative resulting in negative beta weights. Together, the near-zero GLM weights 540 for higher-level features (word onset, surprisal, and entropy) and their lack of added 541 predictive value in the TRF approach demonstrate that the acoustic envelope dominates the 542 trackable haemodynamic signal in this paradigm, leaving little unique variance for the other 543 variables to explain when modelled concurrently. 544 4.4 Limitations and recommendations for fNIRS TRFs 545 Since we recorded data recording during the hackathon, there were several constrains. 546 Some steps, such as mounting the optodes and placing the fNIRS cap were performed by 547 the participants themselves, who had no previous experience with fNIRS (Yücel et al., 2025). 548 These steps were supervised by two experts from Artinis, who gave step-by-step 549 demonstrations and made the necessary adjustments to ensure high quality recording, 550 which was time consuming. The environment was not as controlled as in laboratory settings. 551 For example, external noise was minimised but could not be fully avoided. The key limitation 552 was our sample size (N=8), which was too small for a thorough statistical testing across 553 participants. Nonetheless, we carried out statistical tests across trials, testing for the 554 robustness of numerical results across the different speech segments in the experiment. The 555 limited number of participants posed limitations on certain hyperscanning-oriented analyses, 556 such as wavelet transform coherence (WTC). That is because WTC evaluates coherence 557 across both frequency and time dimensions, requiring long, continuous data streams to 558 resolve slow haemodynamic fluctuations. Becaus e our experimental trials were relatively 559 short for targeting haemodynamic fluctuations (1-2 minutes) and featured distinct sentences, 560 they could not be artificially concatenated or aggregated; doing so would introduce severe 561 temporal discontinuities and edge artifacts, rendering time-resolved synchrony analyses 562 invalid. Future hyperscanning studies aiming to use WTC should ensure continuous, long-563 duration stimulus presentations, larger and better selected sample providing sufficient data 564 to improve the strength of the statistical analyses. 565 In our experiment, stimulus presentation and recording systems could not be directly 566 integrated, meaning we did not have highly prec ise triggers. The PortaSync synchroniser is 567 set up for triggers manually sent via a button press at the start of an event (i.e., a trial) which 568 can creates delays. Since the fNIRS response is much slower than EEG and MEG, these 569 delays between the actual stimulus onset and the triggers are not detrimental. Our 570 recommendation is that the insertion of triggers for events in the recording should be 571 automated (e.g. applying transistor-transistor logi c [TTL] pulses) to ensure more consistent 572 and better synchronisation. 573 We present several recommendations for future studies employing a TRF approach to fNIRS 574 data. First, we recommend padding the stimul us and fNIRS time-series by additional 30 575 seconds (raw fNIRS data for the neural data and zero-padded features for the stimulus data) 576 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint to allow for an entire haemodynamic cycle to take place after the trial ends. Longer time-lags 577 than typical EEG and MEG TRFs should be considered to properly capture this 578 haemodynamic response – up to approximately 10 seconds after feature onset. To improve 579 computation efficiency, we recommend downsampling the neural and stimulus data down to 580 10 Hz. When selecting features, we recommend using those with slow fluctuations, such as 581 the acoustic envelope over those with rapid fluctuations and sharp edges, such as the 582 envelope derivative. Finally, when identifying a null distribution, we recommend using trial 583 mismatch, with trials spaced far apart in time, to try to break the autocorrelation structure 584 within the fNIRS data and minimise inflated null correlation values. 585 4.5 Conclusions 586 Our findings show that TRF estimation can be meaningfully applied to fNIRS data, going 587 beyond typical application in electrophysiological recordings. Despite the slower 588 haemodynamic signals, we were able to show that fNIRS TRFs outperform more 589 conventional GLM based approaches. These result s support the integration of continuous 590 speech-tracking frameworks into fNIRS re search and broaden the methodological tools 591 available for naturalistic hearing and communication studies. 592 593 CRediT Author Contributions 594 Johanna Wilroth – Data curation, formal analysis, investigation, methodology, visualisation, 595 and writing – original draft, writing – review & editing. Nancy Sotero Silva – Investigation, 596 writing – original draft, writing – review & editing. Ali Tafakkor – Data curation, formal 597 analysis, investigation, methodology, visualisation, writing – original draft, writing – review & 598 editing. Bruno de Avo Mesquita – Data curation, investigation, visualisation, writing – 599 review & editing. Emily Y. J. Ip – Data curation, conceptualisation, methodology, writing – 600 review & editing. Bonnie Lau – Investigation, methodology. Jaimy Hannah – Data curation, 601 formal analysis, investigation, methodology, project administration, supervision, visualisation, 602 writing – original draft, writing – review & editing. Giovanni M. Di Liberto – 603 conceptualisation, funding acquisition, investi gation, methodology, resources, supervision, 604 writing – review & editing. 605

Acknowledgements

606 The authors would like to thank the Cognition and Natural Sensory Processing (CNSP) 607 initiative, for providing the blueprint for the analysis code and data standardisation guidelines 608 used in this work. We would like to thank the organisers of the 1 st CNSP Hackathon and all 609 the participants who were directly involved in the fNIRS data collection, including John 610 O’Doherty, Cindy Zhang, and Mike Thornton. We would also like to thank Artinis Medical 611 Systems B.V. for making this study possible, by sharing equipment, guidance on data 612 recording in preparation for and during the CNSP-hackathon 2025. 613 614 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint

References

615 Afyouni, S., Smith, S. M., & Nichols, T. E. (2019). Effective degrees of freedom of the 616 Pearson's correlation coefficient under autocorrelation. NeuroImage, 199. 617 https://doi.org/10.1016/j.neuroimage.2019.05.011 618 Anderson, C. A., Wiggins, I. M., Kitterick, P . T., Hartley, D. E. H., Anderson, C. A., Wiggins, I. 619 M., Kitterick, P . T., & Hartley, D. E. H. (2017). Adaptive benefit of cross-modal 620 plasticity following cochlear implantation in deaf adults. Proceedings of the National 621 Academy of Sciences, 114(38). https://doi.org/10.1073/pnas.1704785114 622 Arbabshirani, M. R., Damaraju, E., Phlypo, R., Plis, S., Allen, E., Ma, S., Mathalon, D., 623 Preda, A., Vaidya, J. G., Adali, T., & Calhoun, V. D. (2014). Impact of autocorrelation 624 on functional connectivity. NeuroImage, 102. 625 https://doi.org/10.1016/j.neuroimage.2014.07.045 626 Bálint, A., Wimmer, W., Caversaccio, M., Rummel, C., & Weder, S. (2025). Brain activation 627 patterns in normal hearing adults: An fNIRS Study using an adapted clinical speech 628 comprehension task. Hearing Research , 455. 629 https://doi.org/10.1016/j.heares.2024.109155 630 Bell, L., Peng, Z. E., Pausch, F., Reindl, V., Neuschaefer-Rube, C., Fels, J., Konrad, K., Bell, 631 L., Peng, Z. E., Pausch, F., Reindl, V., Neuschaefer-Rube, C., Fels, J., & Konrad, K. 632 (2020). fNIRS Assessment of Speech Comprehension in Children with Normal 633 Hearing and Children with Hearing Aids in Virtual Acoustic Environments: Pilot Data 634 and Practical Recommendations. Children 2020, Vol. 7, , 7 (11). 635 https://doi.org/10.3390/children7110219 636 Bertachini, A. L. L., Januario, G. C., Novi, S. L., Mesquita, R. C., Silva, M. A. R., Andrade, G. 637 M. Q., de Resende, L. M., de Miranda, D. M., Bertachini, A. L. L., Januario, G. C., 638 Novi, S. L., Mesquita, R. C., Silva, M. A. R., Andrade, G. M. Q., de Resende, L. M., & 639 de Miranda, D. M. (2021). Hearing brain ev aluated using near-infrared spectroscopy 640 in congenital toxoplasmosis. Scientific Reports 2021 11:1 , 11 (1). 641 https://doi.org/10.1038/s41598-021-89481-0 642 Brodbeck, C., Hong, L. E., & Simon, J. Z. (2018). Rapid Transformation from Auditory to 643 Linguistic Representations of Continuous Speech. Current Biology , 28(24), 3976-644 3983.e3975. https://doi.org/10.1016/j.cub.2018.10.042 645 Carollo, A., Stella, M., Lim, M., Bizzego, A., & Esposito, G. (2025). Emotional content and 646 semantic structure of dialogues are associ ated with Interpersonal Neural Synchrony 647 in the Prefrontal Cortex. NeuroImage, 309. 648 https://doi.org/10.1016/j.neuroimage.2025.121087 649 Chalas, N., Daube, C., Kluger, D. S., Abbasi, O., Nitsch, R., & Gross, J. (2022). Multivariate 650 analysis of speech envelope tracking reveals coupling beyond auditory cortex. 651 NeuroImage, 258. https://doi.org/10.1016/j.neuroimage.2022.119395 652 Crosse, M. J., Butler, J. S., & Lalor, E. C. (2015). Congruent Visual Speech Enhances 653 Cortical Entrainment to Continuous A uditory Speech in Noise-Free Conditions. 654 Journal of Neuroscience, 35(42). https://doi.org/10.1523/JNEUROSCI.1829-15.2015 655 Crosse, M. J., Di Liberto, G. M., Bednar, A., & Lalor, E. C. (2016). The multivariate temporal 656 response function (mTRF) toolbox: A MATL AB toolbox for relating neural signals to 657 continuous stimuli. Frontiers in Human Neuroscience, 10(NOV2016), 219245-658 219245. https://doi.org/10.3389/FNHUM.2016.00604/BIBTEX 659 Crosse, M. J., Zuk, N. J., Di Liberto, G. M., Ni differ, A. R., Molholm, S., & Lalor, E. C. (2021). 660 Linear Modeling of Neurophysiological Responses to Speech and Other Continuous 661 Stimuli: Methodological Considerations for Applied Research. Frontiers in 662 Neuroscience, 15. https://doi.org/10.3389/fnins.2021.705621 663 Czeszumski, A., Eustergerling, S., Lang, A., Menrath, D., Gerstenberger, M., Schuberth, S., 664 Schreiber, F., Rendon, Z. Z., & König, P . (2020). Hyperscanning: A Valid Method to 665 Study Neural Inter-brain Underpinnings of Social Interaction. Frontiers in Human 666 Neuroscience, 14. https://doi.org/10.3389/fnhum.2020.00039 667 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint D'Errico, J. (2026). inpaint_nans . In MATLAB Central File Exchange. 668 https://uk.mathworks.com/matlabcentral/fileexchange/4551-inpaint_nans 669 Di Liberto, G. M., Attaheri, A., Cantisani, G., Reilly, R. B., Ní Choisdealbha, Á., Rocha, S., 670 Brusini, P ., & Goswami, U. (2023). Emergence of the cortical encoding of phonetic 671 features in the first year of life. Nature communications, 14(1), 7789. 672 Di Liberto, G. M., & Ip, E. Y . J. (2025). Speech Neurophysiology in Realistic Contexts: Big 673 Hype or Big Leap? https://doi.org/10.48550/arXiv.2506.05494 674 Di Liberto, G. M., Lalor, E. C., & Millman, R. E. (2018). Causal cortical dynamics of a 675 predictive enhancement of speech intelligibility. NeuroImage, 166, 247-258. 676 Di Liberto, G. M., Nidiffer, A., Crosse, M. J., Zuk, N., Haro, S., Cantisani, G., Winchester, M. 677 M., Igoe, A., McCrann, R., & Chandra, S. (2024). A standardised open science 678 framework for sharing and re-analysing neural data acquired to continuous stimuli. 679 Neurons, Behavior, Data analysis, and Theory, 1-25. 680 Di Liberto, G. M., Pelofi, C., Bianco, R., Patel, P ., Mehta, A. D., Herrero, J. L., De Cheveigné, 681 A., Shamma, S., & Mesgarani, N. (2020). Cortical encoding of melodic expectations 682 in human temporal cortex. eLife, 9, e51784. 683 Ding, N., Chatterjee, M., & Simon, J. Z. (2014). Robust cortical entrainment to the speech 684 envelope relies on the spectro-temporal fine structure. NeuroImage, 88. 685 https://doi.org/10.1016/j.neuroimage.2013.10.054 686 Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while 687 listening to competing speakers. Proceedings of the National Academy of Sciences , 688 109(29), 11854-11859. https://doi.org/10.1073/pnas.1205381109 689 Eulau, K., & Hirsh-Pasek, K. (2024). From behavioral synchrony to language and beyond. 690 Frontiers in Integrative Neuroscience, 18. https://doi.org/10.3389/fnint.2024.1488977 691 Haresign, I. M., Phillips, E. A. M., Whitehorn, M., Goupil, L., Noreika, V., Leong, V., & Wass, 692 S. V. (2022). Measuring the temporal dyna mics of inter-personal neural entrainment 693 in continuous child-adult EEG hyperscanning data. Developmental Cognitive 694 Neuroscience, 54. https://doi.org/10.1016/j.dcn.2022.101093 695 Hasson, U., Ghazanfar, A. A., Galantucci, B., Garrod, S., & Keysers, C. (2012). Brain-to-696 brain coupling: a mechanism for creating and sharing a social world. Trends in 697 Cognitive Sciences, 16(2), 114-121. https://doi.org/10.1016/j.tics.2011.12.007 698 Haufe, S., Meinecke, F., Görgen, K., Dähne, S., Haynes, J.-D., Blankertz, B., & Bießmann, F. 699 (2014). On the interpretation of weight vectors of linear models in multivariate 700 neuroimaging. NeuroImage, 87. https://doi.org/10.1016/j.neuroimage.2013.10.067 701 Hu, Y ., Liu, Y ., Hou, Y ., Pan, Y ., Gao, X., Hu, Y ., Liu, Y ., Hou, Y ., Pan, Y ., & Gao, X. (2025). 702 Dataset of natural conversations about appearance using fNIRS. Scientific Data 2025 703 12:1, 12(1). https://doi.org/10.1038/s41597-025-05574-9 704 Huppert, T. J., Diamond, S. G., Franceschini, M. A., & Boas, D. A. (2009). HomER: a review 705 of time-series analysis methods for near-i nfrared spectroscopy of the brain. Applied 706 Optics, Vol. 48, Issue 10, pp. D280-D298 , 48(10). 707 https://doi.org/10.1364/AO.48.00D280 708 Ip, E. Y . J., Akkaya, A., Winchester, M. M., Bishop, S. J., Cowan, B. R., & Di Liberto, G. M. 709 (2025). Exploring the impact of social relevance on the cortical tracking of speech: 710 viability and temporal response characterisation. bioRxiv, 2025.2009.2023.674728. 711 https://doi.org/10.1101/2025.09.23.674728 712 Lalor, E. C. P ., Alan J.; Reilly, Richard B.; Foxe, John J. . (2009). Resolving Precise Temporal 713 Processing Properties of the Auditory System Using Continuous Stimuli. Journal of 714 Neurophysiology, 102(1). https://doi.org/10.1152/jn.90896.2008 715 Lancaster, G., Iatsenko, D., Pidde, A., Ticcinelli, V., & Stefanovska, A. (2018). Surrogate data 716 for hypothesis testing of physical systems. Physics Reports , 748. 717 https://doi.org/10.1016/j.physrep.2018.06.001 718 Lawrence, R. J., Wiggins, I. M., Anderson, C. A., Davies-Thompson, J., & Hartley, D. E. H. 719 (2018). Cortical correlates of speech intelligibility measured using functional near-720 infrared spectroscopy (fNIRS). Hearing Research , 370. 721 https://doi.org/10.1016/j.heares.2018.09.005 722 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint Levin, M., Balberg, M., Zaltz, Y ., Levin, M., Balberg, M., & Zaltz, Y . (2022). Cortical Activation 723 in Response to Speech Differs between Prelingually Deafened Cochlear Implant 724 Users with Good or Poor Speech-in-Noise Understanding: An fNIRS Study. Applied 725 Sciences 2022, Vol. 12,, 12(23). https://doi.org/10.3390/app122312063 726 Liégeois, R., Yeo, B. T. T., & Ville, D. V. D. (2021). Interpreting null models of resting-state 727 functional MRI dynamics: not throwing the model out with the hypothesis. 728 NeuroImage, 243. https://doi.org/10.1016/j.neuroimage.2021.118518 729 Menn, K. H., Männel, C., & Meyer, L. (2023). Phonological acquisition depends on the timing 730 of speech sounds: Deconvolution EEG m odeling across the first five years. Science 731 Advances, 9(44), eadh2560. 732 Miezin, F. M., Maccotta, L., Ollinger, J. M., Petersen, S. E., & Buckner, R. L. (2000). 733 Characterizing the Hemodynamic Response: Effects of Presentation Rate, Sampling 734 Procedure, and the Possibility of Ordering Brain Activity Based on Relative Timing. 735 NeuroImage, 11(6). https://doi.org/10.1006/nimg.2000.0568 736 Minagawa-Kawai, Y ., Mori, K., Hebden, J. C., & Dupoux, E. (2008). Optical imaging of 737 infants' neurocognitive development: Recent advances and perspectives. 738 Developmental Neurobiology, 68(6). https://doi.org/10.1002/dneu.20618 739 O'Sullivan, J. A., Power, A. J., Mesgarani, N., Rajaram, S., Foxe, J. J., Shinn-Cunningham, 740 B. G., Slaney, M., Shamma, S. A., & Lalor, E. C. (2015). Attentional Selection in a 741 Cocktail Party Environment Can Be Decoded from Single-Trial EEG. Cerebral Cortex 742 (New York, N.Y.: 1991), 25(7), 1697-1706. https://doi.org/10.1093/cercor/bht355 743 Olszowy, W., Aston, J., Rua, C., Williams, G. B., Olszowy, W., Aston, J., Rua, C., & Williams, 744 G. B. (2019). Accurate autocorrelation modeling substantially improves fMRI 745 reliability. Nature Communications 2019 10:1 , 10(1). https://doi.org/10.1038/s41467-746 019-09230-w 747 Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., Kastman, E., 748 & Lindeløv, J. K. (2019). PsychoPy2: Experiments in behavior made easy. Behavior 749 research methods, 51, 195-203. 750 Peñ a , M., Maki, A., K ovac /i1i ć, D., De haene -Lam be r tz , G., Koizumi, H., Bo uq uet, F., Mehler, 751 J., P eñ a , M., Ma k i, A., K ovac /i1ić , D., Dehaene-Lambertz, G., Koizumi, H., Bouquet, 752 F., & Mehler, J. (2003). Sounds and silence: An optical topography study of language 753 recognition at birth. Proceedings of the National Academy of Sciences , 100(20). 754 https://doi.org/10.1073/pnas.1934290100 755 Pinti, P ., Tachtsidis, I., Hamilton, A., Hirsch, J., Aichelburg, C., Gilbert, S., & Burgess, P . W. 756 (2020). The present and future use of func tional near-infrared spectroscopy (fNIRS) 757 for cognitive neuroscience. Annals of the new York Academy of Sciences, 1464(1). 758 https://doi.org/10.1111/nyas.13948 759 Pollonini, L., Olds, C., Abaya, H., Bortfeld, H., Beauchamp, M. S., & Oghalai, J. S. (2014). 760 Auditory cortex activation to natural speech and simulated cochlear implant speech 761 measured with functional near-infrared spectroscopy. Hearing Research , 309. 762 https://doi.org/10.1016/j.heares.2013.11.007 763 Quaresima, V., Ferrari, M., & Valentina Quares ima, M. F. (2019). Functional Near-Infrared 764 Spectroscopy (fNIRS) for Assessing Cerebral Cortex Function During Human 765 Behavior in Natural/Social Situations: A Concise Review. Organizational Research 766 Methods, 22(1). https://doi.org/10.1177/1094428116658959 767 Rossi, S., Telkemeyer, S., Wartenburger, I., & Obrig, H. (2012). Shedding light on words and 768 sentences: Near-infrared spectr oscopy in language research. Brain and Language , 769 121(2). https://doi.org/10.1016/j.bandl.2011.03.008 770 Rowland, S. C., Hartley, D. E. H., Wiggins, I. M., & Stephen C. Rowland, D. E. H. H., Ian M. 771 Wiggins. (2018). Listening in Naturalistic Scenes: What Can Functional Near-Infrared 772 Spectroscopy and Intersubjec t Correlation Analysis Tell Us About the Underlying 773 Brain Activity? Trends in Hearing, 22. https://doi.org/10.1177/2331216518804116 774 Santosa, H., Zhai, X., Fishburn, F., Huppert, T., Santosa, H., Zhai, X., Fishburn, F., & 775 Huppert, T. (2018). The NIRS Brain AnalyzIR Toolbox. Algorithms 2018, Vol. 11, , 776 11(5). https://doi.org/10.3390/a11050073 777 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint Sevy, A. B. G., Bortfeld, H., Huppert, T. J., Beauchamp, M. S., Tonini, R. E., & Oghalai, J. S. 778 (2010). Neuroimaging with near-infrared s pectroscopy demonstrates speech-evoked 779 activity in the auditory cortex of deaf children following cochlear implantation. Hearing 780 Research, 270(1-2). https://doi.org/10.1016/j.heares.2010.09.010 781 Shen, X., Tao, L., Chen, X., Song, S., Liu, Q., & Zhang, D. (2024). Contrastive learning of 782 shared spatiotemporal EEG representati ons across individuals for naturalistic 783 neuroscience. NeuroImage, 301. https://doi.org/10.1016/j.neuroimage.2024.120890 784 Sonkusare, S., Breakspear, M., & Guo, C. (2019). Naturalistic Stimuli in Neuroscience: 785 Critically Acclaimed. Trends in Cognitive Sciences , 23(8). 786 https://doi.org/10.1016/j.tics.2019.05.004 787 Tucker, S., Dubb, J., Kura, S., Lühmann, A. v., Franke, R., Horschig, J. M., Powell, S., 788 Oostenveld, R., Lührs, M., Delaire, É., Aghajan, Z. M., Yun, H., Yücel, M. A., Fang, 789 Q., Huppert, T. J., Frederick, B. d., Pollonini, L., Boas, D. A., Luke, R.,…Luke, R. 790 (2022). Introduction to the shared near infrared spectroscopy format. 791 Neurophotonics, 10(1). https://doi.org/10.1117/1.NPh.10.1.013507 792 van de Rijt, L. P . H., van Opstal, A. J., Mylanus, E. A. M., Straatman, L. V., Hu, H. Y ., Snik, A. 793 F. M., & van Wanrooij, M. M. (2016). Temporal Cortex Activation to Audiovisual 794 Speech in Normal-Hearing and Cochlear Implant Users Measured with Functional 795 Near-Infrared Spectroscopy. Frontiers in Human Neuroscience , 10. 796 https://doi.org/10.3389/fnhum.2016.00048 797 Wong, D. D. E., Fuglsang, S. A., Hjortkjær, J., Ceolini, E., Slaney, M., & de Cheveigné, A. 798 (2018). A Comparison of Regularization Methods in Forward and Backward Models 799 for Auditory Attention Decoding. Frontiers in Neuroscience , 12. 800 https://doi.org/10.3389/fnins.2018.00531 801 Yücel, M. A., Luke, R., Mesquita, R. C., von Lühmann, A., Mehler, D. M. A., Lührs, M., 802 Gemignani, J., Abdalmalak, A., Albrecht, F., de Almeida Ivo, I., Artemenko, C., 803 Ashton, K., Augustynowicz, P ., Bajracharya, A., Bannier, E., Barth, B., Bayet, L., 804 Behrendt, J., Khani, H. B.,…Zemanek, V. (2025). fNIRS reproducibility varies with 805 data quality, analysis pipelines, and researcher experience. Communications Biology 806 2025 8:1, 8(1). https://doi.org/10.1038/s42003-025-08412-1 807 808 .CC-BY-NC 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-NC-4.0