Keywords
39
Functional Near-Infrared Spectroscopy; Temporal Response Function; Continuous 40
Speech; Neural Tracking. 41
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
Highlights 42
• TRF modelling can be meaningfully applied to fNIRS data acquired during 43
speech listening tasks. 44
• Prediction correlations between actual and modelled fNIRS signals were above 45
chance level, with values comparable to previous EEG/MEG studies. 46
• TRFs explained more fNIRS varianc e than a conventional GLM approach. 47
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
1. Introduction 48
Neurophysiology research on speech perception has traditionally relied on highly controlled 49
experimental paradigms that use short, isolated uni ts of speech such as syllables, words, or 50
brief sentences. While these approaches a llow a high degree of control and precise 51
alignment with event-related designs, they fail to capture important properties of natural 52
speech, including continuity, te mporal dynamics, and contextual complexity (Shen et al., 53
2024). More recently, however, there has been a shift towards using continuous, naturalistic 54
stimuli that better approximate real-life listeni ng situations, motivated by the opportunity to 55
investigate how social factors interact with sound and language processing (Di Liberto & Ip, 56
2025; Hu et al., 2025; Rowland et al., 2018; Sonkusare et al., 2019). While traditional 57
electrophysiological paradigms primarily rely on stimuli on the order of milliseconds (Rossi et 58
al., 2012), several neuroimaging studies have de monstrated the feasibility of using longer 59
speech segments (e.g., audio-books; podcast s) to study speech comprehension and 60
listening effort (Bálint et al., 2025; Bertachini et al., 2021; Levin et al., 2022; Pollonini et al., 61
2014; van de Rijt et al., 2016). 62
This shift toward continuous speech has been accompanied by the introduction of analytic 63
frameworks for modelling the relationship between continuous stimulus features and neural 64
responses. A widely adopted framework for estimating such relationships is the Temporal 65
Response Function (TRF), where encoding mode ls are derived using regularised ridge 66
regression (Crosse et al., 2016; Crosse et al ., 2021). TRFs have been extensively applied to 67
electroencephalography (EEG) and magnetoencephalography (MEG) data to quantify neural 68
tracking of speech and music, shedding light on the brain represents linguistic and acoustic 69
units (Brodbeck et al., 2018; Di Liberto et al., 2023; Menn et al., 2023), implements auditory 70
attention mechanisms (Ding & Simon, 2012; O' Sullivan et al., 2015), integrates audiovisual 71
cues (Crosse et al., 2015), and implements predictive processing (Di Liberto et al., 2018), 72
among other cognitive operations (Di Liberto et al., 2020). 73
EEG and MEG offer excellent temporal resolution and remain indispensable for studying the 74
millisecond-level dynamics of speech percepti on and language processing. However, like all 75
neuroimaging techniques, they come with modality specific trade-offs that can make certain 76
naturalistic paradigms more suitable than others. For instance, the low spatial resolution of 77
EEG limits precise source localisation, especially for deeper or distributed cortical 78
generators. MEG offers improved spatial resolution to EEG, but both modalities are sensitive 79
to movement-related artifacts, which can constrain experimental designs involving 80
interactive, mobile, or highly naturalistic behavio urs. This is especially true with MEG, where 81
magnetic shielding is critical, and so participation occurs in small shielded in rooms, with 82
certain MEG systems also requiring participants to be head fixed. For these reasons, 83
considering other neuroimaging methods with bette r experimental flexibility is important to 84
broaden the range of paradigms that can be meaningfully investigated. 85
Functional near-infrared spectroscopy (fNIRS) is an optical neuroimaging technique that 86
indirectly measures brain activity (Czeszumski et al., 2020; Eulau & Hirsh-Pasek, 2024; 87
Quaresima et al., 2019). It relies on the comparison of haemoglobin oxygenation 88
concentration
s in nervous tissues, offering blood flow measurements (Eulau & Hirsh-Pasek, 89
2024; Pinti et al., 2020). Due to its non-invasive nature and robustness to movement 90
artifacts, it has been used in substitution or combination with EEG, reaching cortical areas 91
up to 1.5 to 2 cm in depth (Czeszumski et al., 2020; Eulau & Hirsh-Pasek, 2024; Pinti et al., 92
2020). Additionally, due to its compatibility with electronic and magnetic devices, fNIRS has 93
also been widely used in research focused on hearing and speech perception, especially 94
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
with hearing aid and cochlear implant users (A nderson et al., 2017; Bell et al., 2020; Sevy et 95
al., 2010). 96
fNIRS has been successfully employed to study th e speech processing hierarchy in infants 97
and adults (Rossi et al., 2012), including phone mic contrasts to syllables and words 98
(Minagawa-Kawai et al., 2008), sentence-level processing (Peña et al., 2003), and 99
challenging listening conditions such as vocoded speech (Lawrence et al., 2018). However, 100
these studies typically rely on comparing the condition of interest with a baseline condition 101
using general linear models (GLM). This reliance stems from the slow and overlapping 102
nature of the haemodynamic response, which peak several seconds after stimulus onset, 103
making it difficult to isolate neural responses without clearly separated blocks. Such block 104
designs place constraints on experimental flexibility, limiting the study of natural speech 105
processing. 106
Here, we examine whether a TRF framework can be meaningfully applied to fNIRS data 107
acquired during a continuous speech listeni ng task. The fNIRS experiment was conducted 108
as part of the 1 st Cognition and Natural Sensory Processing (CNSP) hackathon, a satellite 109
event to the 8 th International Conference on Audito ry Cortex. In a single hyperscanning 110
session using a classroom-style design, fNIRS data were recorded simultaneously from 111
eight participants as they listened to podcast dialogues presented through a loudspeaker. 112
The stimuli were identical to those used in a previous EEG study on podcast dialogue 113
listening (Ip et al., 2025), providing clear expec tations for the temporal response patterns. 114
Encoding TRF and decoding analyses were used to assess the extent to which fNIRS can 115
capture speech-related neural proc essing across multiple levels of the processing hierarchy. 116
Although this study was conducted in a hyperscanning context, this was not the primary 117
focus of our analyses. Rather than targeting in tersubject synchronisation, we aim to bridge 118
TRF analyses with the increasing use of fNIRS in hearing and communication research, and 119
to evaluate if fNIRS can support continuous, naturalistic paradigms using analysis methods 120
designed for electrophysiology. 121
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
2. Methods 122
2.1 Participants 123
Eight volunteer Hackathon attendees (four female, four male) between 25 and 43 years of 124
age ( M = 30.63) took part in the experiment. Five of the eight participants were native 125
English speakers and the remaining three were highly proficient in English. The study was 126
approved by the School of Computer Science and Statistics Research Ethics Committee at 127
the University of Dublin, Trinity College, and all participants provided written informed 128
consent. Data collection took place in a workroom with natural lighting in the office building 129
where Hackathon activities were carried out in Maastricht, the Netherlands. 130
2.2 Stimuli and Task 131
A previous experiment on non-simultaneous EEG recordings was adapted for the 132
hyperscanning fNIRS setting (Ip et al., 2025), foll owing a similar presentation of stimuli. 133
Participants sat together in front of a screen wher e written instructions and a visual fixation 134
cross were displayed. Audio stimuli were deliv ered by loudspeakers at the front of the room 135
and presented at a sampling rate of 44,100 Hz. Stimuli were presented and controlled by the 136
PsychoPy Python library version 2025.1.1 (Peirce et al., 2019). The audio samples consisted 137
of 28 dialogues taken from four sources: two American podcasts ( Brains On! and Forever 138
Ago), a YouTube series by WIRED (5 Levels of Difficulty), and interviews with the hosts from 139
the podcast Forever Ago . All podcasts and shows are publicly available online 1. The 140
dialogues discussed general interest topics related to science, arts, and childhood 141
memories. All audios were one-on-one dialogues fr om either adult-adult (10 trials) or adult-142
child (18 trials) interactions, totalling appr oximately one hour. The whole experiment was run 143
in a single session, as data from all participants were acquired simultaneously. 144
2.3 fNIRS Data Acquisition 145
fNIRS data was recorded with Brite Ultra System (Artinis Medical Systems B.V., Elst, The 146
Netherlands), a system specifically designed for hyperscanning recordings. The fNIRS 147
devices were wireless (model Brite MKII), each including 18 optodes (10 transmitters and 8 148
receivers) leading to 16 long-separation channels and two short-separation channels (see 149
Table 1). Optodes were mounted in caps, covering frontal and temporal regions. The 150
montage of the optodes and placement of the caps was carried out by the volunteers, with 151
the assistance of Artinis’ fNIRS experts as part of a hands-on fNIRS workshop, with step-by-152
step supervision. Data were recorded using Brite Connect Ultra, an ad-hoc software platform 153
designed for large-scale fNIRS hyperscanning, consisting of a principal recording hub and 154
one tablet interface. This allowed two researchers (one coordinator and one assistant) to 155
supervise the multiple simultaneous recordings. Data were recorded at a sampling frequency 156
of 100 Hz for all participants. Event triggers were sent manually via button presses using 157
PortaSync (a wireless remote device compatible with Artinis systems) to enable time-series 158
synchronisation with the stimulus presentation by PsychoPy, which were delivered to a 159
computer not integrated to the Brite system. 160
Table 1. Optode Information 161
1 https://laist.com/podcasts/servant-of-pod/kids-podcasts-a-true-alternative-to-screen-time;
https://creators.spotify.com/pod/profile/martine-severin/episodes/31--How-to-Become-an-Epic-
ImprovComic-with-Joy-Dolo-e1js9c2
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
Channel Transmitter Receiver Position Channel Length (mm)
1 1 1 Right frontal 25.76
2 1 2 Right frontal 31.62
3 2 1 Short separation 0
4 3 1 Right frontal 34.43
5 3 2 Right frontal 24.72
6 4 3 Right temporal 31.33
7 4 4 Right temporal 28.37
8 5 3 Right temporal 31.24
9 5 4 Right temporal 34.75
10 6 5 Left frontal 25.88
11 6 6 Left frontal 31.64
12 7 5 Short separation 0
13 8 5 Left frontal 34.46
14 8 6 Left frontal 24.93
15 9 7 Left temporal 31.39
16 9 8 Left temporal 28.39
17 10 7 Left temporal 31.15
18 10 8 Left temporal 34.73
162
2.4 Experimental procedure 163
The 28 audio trials were presented in a pr e-determined random order, each followed by a 164
brief break for participants, where they rated their interest in the dialogue on a scale from 1 165
to 5 (1 = not at all interesting; 3 = neutral; 5 = very interesting). The breaks had variable 166
length and were controlled by the researchers to ensure that all participants had time to 167
respond before proceeding to the next trial. On every second trial, a behavioural multiple-168
choice question was presented to maintain participants’ attention and engagement to the 169
task. The question was to select descriptive ke ywords that best described the speech in the 170
trial. These behavioural responses acted as measures of attention and engagement during 171
the task but were not used in further analyses. 172
2.5 fNIRS Preprocessing and TRF Analysis 173
The raw fNIRS data were stored in shared near infrared spectroscopy format (SNIRF) 174
(Tucker et al., 2022). Preprocessing was conduc ted in MATLAB using inspiration from the 175
open source fNIRS preprocessing pipelines for the NIRx system 2 and for the Artinis system 176
using SNIRF 3. Our pipeline requires three publicly available toolboxes: inpaint_nans 177
(D'Errico, 2026), Homer3 (Huppert et al., 2009) and NIRS (Santosa et al., 2018). The 178
following preprocessing steps were applied: (1) early pruning based on raw intensity levels 179
to identify missing values and signal dropouts, (2 ) conversion of light intensity to optical 180
density (OD), (3) band-pass filtering between 0.01 and 0.7 Hz was applied because 181
stimulus-related activity is not expected below 0.01 Hz, while haemodynamic responses 182
typically occur below 0.5 Hz. The upper cutoff also helps attenuate cardiac signals, which 183
2 https://github.com/smburns47/preprocessingfNIRS
3 https://github.com/Artinis-Medical-Systems-B-V/snirf_data_example
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
usually occur around 1 Hz, (4) inspection and removal of potential bad channels (none were 184
identified), and (5) exclusion of the two shor t-separation channels. Subsequently, each trial 185
was segmented and aligned with the corresponding speech stimulus, downsampled to 25 186
Hz, and saved in the Continuous-event Neural Data (CND) data structure (Di Liberto et al., 187
2024). 188
We extracted five stimulus features to use in our analyses including the acoustic envelope, 189
half-way rectified envelope derivative, word onset times, and surprisal and entropy from 190
GPT-2 (see (Ip et al., 2025) for detailed descr iptions of feature extraction). Temporal 191
response functions (TRFs) were used to es timate the linear mapping between a given 192
sensory stimulus feature and the corresponding neural responses (Ding et al., 2014; Lalor, 193
2009). TRFs were computed using the mTRF-Toolbox (Crosse et al., 2016) and custom 194
code built starting from scripts from the CNSP open science initiative 4. TRFs estimate the 195
relationship between the stimulus features and fNIRS signals as a linear time-invariant 196
system via a lagged Ridge regression procedur e. TRF models were fit for each participant 197
separately, using a leave-one-out cross-validat ion procedure across trials to control for 198
overfitting, and Ridge regularisati on to prevent overfitting. The regularisation parameter ( λ ) 199
was optimised through an exhaustive search over a logarithmic range from 0.00001 to 200
1000000 within each training fold. For forward models, the TRF estimates a temporal filter 201
for each fNIRS channel, capturing how the neural response at a given time point can be 202
predicted from the preceding stimulus. The optimal λ for forward models was defined as the 203
value yielding the highest prediction correlation (Pearson’s r) between the predicted and 204
observed fNIRS signals (Crosse et al., 2016). Backward models were fit to reconstruct 205
stimulus features from the fNIRS signals, where data from all channels is considered 206
simultaneously. The optimal value of λ for backward models was defined as the value 207
yielding the highest Pearson’s correlation between the reconstructed and actual stimulus 208
feature. 209
Since fNIRS measures haemodynamic responses evolve much slower than signals 210
measured by EEG or MEG, several adjustments were made. First, trials were segmented by 211
including additional 30 seconds of fNIRS data after the end of the sound stimulus, ensuring 212
that speech-related haemodynamic responses are fully included in the trials. A technical 213
requirement for doing so was to zero-pad the stimulus features to match the fNIRS 214
segments. Second, we selected TRF time-lags ranging from 0 to 30 seconds to capture the 215
slow haemodynamic response; much longer than those usually used in EEG and MEG 216
studies. To explore the possibility of improving computational efficiency for the TRF model fit, 217
we tested the TRF analysis for several downsampling frequencies, ranging from 1 Hz to 25 218
Hz, determining the minimum sampling frequency possible without information loss (Figure 219
1B). Finally, since fNIRS measures change s in oxygenated and deoxygenated haemoglobin 220
concentrations (HbO and HbR respectively), s eparate models for fit for each metric. Due to 221
our small sample size, all statistical comparis ons were done across trials rather than across 222
participants. 223
To test if our TRF models were explaining meaningful variance in the fNIRS response, we 224
compared our prediction correlation values to a null distribution. For each feature and for a 225
full multivariate model, null distributions were created by mismatching the stimulus features 226
and fNIRS data across trials, using pairs separat ed by at least five trials to minimise 227
temporal autocorrelation in the slow fNIRS signals. This was repeated 100 times per subject. 228
Prediction correlations were computed for each iteration and averaged across channels and 229
4 https://github.com/CNSP-Workshop/CNSP-resources
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
trials, resulting in 800 null correlation values (100 per subject). Our observed mean 230
prediction correlations were then compared to the null distribution for the corresponding 231
model. 232
2.6 General Linear Model analysis 233
To provide a benchmark for the TRF approac h, we implemented a standard general linear 234
model (GLM) analysis using a cross-validated framework. For each participant, a design 235
matrix was constructed by convolving the ex tracted stimulus features (i.e., acoustic 236
envelope, envelope derivative, word onsets, lexical surprisal, and entropy) with a canonical 237
haemodynamic response function (HRF). The HRF was generated using standard SPM12 238
parameters, including a peak delay of 6 s and an undershoot delay of 16 s, and was 239
normalised to a peak amplitude of one. Following convolution, all regressors were z-scored 240
to normalise their scales and ensure the stability of the subsequent regression. The model 241
also incorporated nuisance regressors to account for non-neural signal components and low-242
frequency fluctuations. These included trial-specific intercepts to capture baseline offsets for 243
each speech segment and linear drift terms to model slow signal trends within each trial 244
duration. Prior to the fitting procedure, fNIR S data were pre-processed by applying a linear 245
detrend operation ( detrend function in MATLAB) and z-score transformation to each 246
individual trial. 247
Predictive performance was assessed using a 10-fold cross-validation procedure. At each 248
iteration, nine segments were used to estimate the model weights via ordinary least squares 249
(OLS) regression. The resulting weights were then applied to the design matrix of the 250
remaining held-out segment to generate a predict ed fNIRS time-series. This process was 251
repeated for each fold, and model accuracy wa s quantified by calculating the Pearson’s 252
correlation coefficient (r) between the predicted and ground-truth signals, for each of the 16 253
channels. These correlations were calc ulated separately for oxygenated (HbO), 254
deoxygenated (HbR), and total haemoglobin (HbT) metrics to allow for a comprehensive 255
comparison of model predictive performance acro ss different physiological signals. By using 256
cross-validation and Pearson’s correlations, we ensured that the GLM results were directly 257
comparable to the TRF metrics. 258
3. Results 259
Participants rated the passages to be of neutral interest on average ( M = 3.04), with the 260
adult-child trials being rated as slightly more enjoyable ( M = 3.20) than the adult-adult trials 261
(M = 2.76). Accuracy on the comprehension questions was high (range: 76.9% - 100%, M = 262
89.4%), with accuracy being slightly higher for the adult-child trials ( M = 91.7%) than the 263
adult-adult trials (M = 87.5%). 264
To determine model parameters for subsequent analyses, we first explored the impact of the 265
downsampling frequency on the performance of forward and backward models (Figure 1). 266
We ran both backward and forward TRF models on the acoustic envelope for both HbO and 267
HbR (Figure 1A). While backward models are often reported with higher correlation scores 268
than forward models in EEG and MEG TRF analyses (Crosse et al., 2016; Haufe et al., 269
2014), we observed similar median correlations and no clear differences between modelling 270
direction in fNIRS. Therefore, we proceeded with the forward model for subsequent 271
analyses, primarily due to its more direct interpretability (Haufe et al., 2014). Forward models 272
provide TRF weights that characterise how different features contribute to the fNIRS signal. 273
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
Because fNIRS measures slow haemodynam ic responses, lower analysis sampling 274
frequencies than those used in EEG or MEG should be sufficient. While lower sampling 275
rates reduce computational cost, overly lo w frequencies may fail to capture relevant 276
response dynamics or, similarly, cause the loss of critical stimulus information. We tested six 277
different analysis sampling frequencies from 1Hz to 25Hz to determine the minimum 278
sampling frequency without information loss (Figure 1B). Performance was reduced at 1 and 279
5 Hz, whereas correlations appeared to stabilise at 10 Hz for both HbO and HbR. For HbO, 280
a repeated measures ANOVA with Greenhouse-Geisser correction revealed a significant 281
effect of sampling frequency ( F(1.24, 33.42) = 32.22, p < .001). We then ran follow up 282
pairwise t-tests (false discovery rate [FDR] corrected) on adjacent sampling frequencies 283
(e.g., 1 to 5, 5 to 10, etc.). We found that the mean prediction correlation across trials was 284
significant greater for 5 Hz than 1 Hz (t(27) = 4.77, p < .001) and for 10 Hz than 5 Hz (t(27) = 285
5.72, p < .001). There were no significant differences between 15 Hz and 10 Hz ( t(27) = 286
0.35, p = .782, 20 Hz and 15 Hz t (27) = 1.63, p = .132, or 25 Hz and 20 Hz t(27) = 0.25, p < 287
.808. Similarly for HbR, we found a significant effect of sampling frequency ( F(1.32, 35.58) = 288
21.18, p < .001; Greenhouse-Geisser corrected). We then ran follow up pairwise t-tests 289
(FDR corrected) on adjacent sampling frequencies (e.g., 1 to 5, 5 to 10, etc.). We found that 290
the mean prediction correlation across trials was significant greater for 5 Hz than 1 Hz ( t(27) 291
= 3.59, p = .002) and for 10 Hz than 5 Hz ( t(27) = 4.63, p < .001). There were no significant 292
differences between 15 Hz and 10 Hz ( t(27) = 0.60, p = .639, 20 Hz and 15 Hz t(27) = 0.40, 293
p = .689, or 25 Hz and 20 Hz t (27) = 0.53, p < .647). Based on this trade-off between model 294
performance and computational efficiency, subsequent analyses were conducted using a 295
sampling frequency of 10 Hz. All subsequent results are presented for the forward model 296
with a sampling frequency of 10 Hz, unless stated otherwise. 297
298
Figure 1. C
omparison of TRF model parameters. (A) Backward and forward univariate 299
TRF model performance. Red colour denotes TRF results for HbO signals, and blue denotes 300
HbR. The y-axis refers to the Pearson’s correlation metric (envelope reconstruction and 301
fNIRS prediction for backward and forward models respectively). (B) Impact of sampling 302
frequency on the forward TRF model performance. 303
3.1 fNIRS Tracking of the Acoustic Envelope 304
Univariate forward TRFs relating sound env elope end fNIRS time-series were examined. 305
HbO values yielded prediction correlations that were 10.3% higher than HbR on average 306
(Figure 2A). Here, we also computed prediction correlations for the sum of HbO and HbR 307
(referred to as HbT) and found comparable values to HbO (2.1% greater for HbT than HbO). 308
As such, the analyses that follow are carried out on HbO and HbR separately. When the 309
prediction correlations were broken down by channel, we can see that frontal channels tend 310
to yield higher values than temporal channels for HbO (Figure 2C) but not for HbR (Figure 311
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
2D). We averaged the prediction correlation across the four channels in each group of 312
optodes (left frontal, left temporal, right front al, right temporal) for both HbO and HbR. For 313
HbO, a repeated measures ANOVA with Greenhouse Geisser correction revealed a 314
significant effect of channel location ( F(1.76, 47.41) = 9.269, p < .001). Follow up paired t-315
tests (FDR corrected) showed that the left frontal group yielded significantly higher prediction 316
correlations than the left temporal ( t(27) = 3.97, p = .002), right frontal ( t(27) = 2.36, p = 317
.031), and right temporal groups ( t(27) = 3.77, p = .002). The right frontal group led to larger 318
prediction correlations than either temporal group (left: t(27) = 2.62, p = .029; right: t (27) = 319
2.49, p = .029), and there was no significant diff erence between the two temporal groups 320
(t(27) = -0.26, p = .799). For HbR, a repeated measures ANOVA once again revealed a 321
significant effect of channel location ( F(3, 81) = 3.585, p = .017). Here, the only significant 322
pairwise difference was between the le ft frontal and right frontal groups ( t(27) = 3.14, p = 323
.024; FDR corrected), with the prediction correlations for the left frontal channels being 324
higher on average. 325
We then examined the TRF model weights for both metrics (Figure 2B). Compatibly with the 326
properties of haemodynamic responses in the brain, which are known to peak between 3 327
and 6 seconds after local neural activity increases (Miezin et al., 2000), we expected TRFs 328
to peak at latencies in that interval. Visually, the most prominent TRF components emerged 329
at latencies of 3–5 seconds. Dominant components at those latencies emerged at seven out 330
of 16 channels for HbO, and at one left frontal channel for HbR. 331
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
332
Figure 2. Results from forward models using the acoustic envelope speech feature at 333
a sampling frequency of 10 Hz. (A) Boxplots of mean fNIRS prediction correlation values 334
averaged across trials, channels and participants for HbO (red), HbR (blue) and HbT (grey). 335
Each marker represents individual performance. (B) Magnitude of TRF weights for HbO 336
(red) and HbR (blue). (C-D) Channel-wise fNIRS prediction correlation values, colour-coded 337
by cortical region, for HbO (C) and HbR (D). (E-F) Channel-wise fNIRS prediction correlation 338
values mapped onto MNI space, coloured by correlation magnitude for HbO (E) and HbR 339
(F). 340
3.2 Univariate versus Multivariate TRFs 341
We next investigated how different stimulus featur es contribute to the TRF model. We fit five 342
univariate models, one for each feature (envelope, envelope derivative, word onsets, lexical 343
surprisal, and entropy), as well as a multivariate model including all five features. All TRFs 344
led to average fNIRS prediction correlations values within 0.1 and 0.2 (Figure 3A). The 345
univariate envelope TRF and the multivariate TRF led to the highest average prediction 346
correlation values for both HbO (envelope: M = 0.188; multivariate: M = 0.181) and HbR 347
(envelope: M = 0.170; multivariate: M = 0.167), while the univariate envelope derivative 348
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
model resulted in the lowest prediction correlation values for both signals (HbO: M = 0.131; 349
HbR: M = 0.122). Pairwise t-tests (FDR corrected) across all the models revealed several 350
differences in prediction correlations. Specific ally, for HbO, the multivariate model and the 351
univariate envelope model yielded significantly higher prediction correlations than all 352
univariate models (all p < .05). There was no significant difference between these two 353
models ( t(27) = 1.42, p = .167). The univariate derivative model resulted in significantly 354
lower prediction correlations than all models ex cept the univariate surprisal model (all p < 355
.05; derivative versus surprisal: t(27) = -1.67, p = .115). For HbR, there were no significant 356
differences between the multivariate model, the univariate envelope model, and the 357
univariate word onset model (all p > .05). All three of these models yielded significantly 358
higher prediction correlations than the univariate derivative model and the univariate 359
surprisal model (all p < .05). The multivariate model and the univariate envelope model were 360
significantly higher than the entropy model (all p < .05), but there was no significant 361
difference between the univariate word onset model and the univariate entropy model ( t(27) 362
= 1.89, p = .102). 363
TRF weights from the multivariate model averaged across channels showed that all five 364
speech features exhibited similar temporal profiles (Figure 3B). Based on visual inspection, 365
for HbO the responses for all features appeared to display a broader peak between 3–5 s, 366
whereas HbR showed a more pronounced peak around 5 s, again consistent with the 367
haemodynamic response. 368
Further analyses were conducted to measur e the unique contribution of each stimulus 369
feature to the multivariate TRF (Figure 3C). That was quantified as the fNIRS prediction 370
correlation loss after removing one stimulus feature at a time at the model training stage 371
(i.e., feature importance). We found envelope was the largest contributor to the TRF models 372
for both HbO and HbR. We ran paired t-tests (FDR corrected) comparing the full multivariate 373
model to reduced models in which each feature was omitted individually. For both HbO and 374
HbR, the only feature that resulted in significantly lower prediction correlations when 375
dropped was the envelope (HbO: t(27) = 3.44, p = .01; HbR: t(27) = 2.94, p = .033).Together 376
with the results from the univariate models, this shows that the envelope appears to be the 377
most important feature in our TRF models. 378
The laterality of the univariate models was computed as the difference between prediction 379
correlations from left- and right-hemisphere c hannels for each of speech features for both 380
metrics (Figure 3D). We ran pairwise t-tests (FDR corrected) between the mean of the left 381
hemisphere channels and the mean of the right hemisphere channels for each model. 382
Although the mean difference between the left and right was positive for all features, none 383
showed significant laterality (all p > .05). 384
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
385
386
Figure 3. Univariate and multivariate forward TRF models at a sampling frequency of 387
10Hz. (A) Boxplots of mean prediction correlation values averaged across trials, channels, 388
and participants for HbO (red) and HbR (blue), shown for five speech features (envelope, 389
derivative, entropy, surprisal, word onset) as well as for a multivariate model including all 390
features. (B) TRF weights averaged across trials, channels, and participants for each 391
speech feature, shown separately for HbO (top) and HbR (bottom). (C) Average correlation 392
gain of the full multivariate model (all five features) relative to multivariate models with one 393
feature omitted. (D) Laterality analysis computed as the difference between fNIRS prediction 394
correlations from left-hemisphere channels and right-hemisphere channels. 395
3.4 Is it worth using TRFs to analyse fNIRS Data? 396
To assess whether the TRF performance exceeded chance level, we compared our obtained 397
fNIRS prediction correlations to a null distribution of 800 datapoints created by using 398
mismatched trial data (Figure 4A, B). TRF model s resulted in significantly higher prediction 399
correlations for all features than the null distribution (including for the multivariate model 400
including all five features) for both HbO and HbR. For HbO, the prediction correlations we 401
obtained from the univariate envelope, entropy, surprisal and word onset models were 402
beyond the 99 th percentile of the null distribution ( p < .01) and for the univariate derivative 403
and multivariate model, our obtained values were beyond the 95 th percentile ( p < .05). For 404
HbR, all six models resulted in prediction correlations beyond the 99 th percentile of the null 405
distribution (p < .01). This means that the TRF is c apturing meaningful variance in response 406
to these stimulus features for all models and both haemodynamic metrics. 407
Next, we examined the performance of a GLM approach with cross-validation. When fitting 408
the GLM with all five stimulus features convolved with a canonical HRF, the model 409
successfully predicted the fNIRS signals above chance, yielding positive median correlation 410
(r) values ranging between 0.12 and 0.13 across HbO and HbR metrics. The GLM was fitted 411
with all five stimulus features included simultaneously. The resulting beta weights reflect the 412
magnitude of the canonical HRF response to each feature and are conceptually comparable 413
to the peak amplitudes of the corresponding TRFs, though unlike the TRF, the GLM does not 414
estimate the shape of the haemodynamic response from the data. While the GLM provided a 415
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
reliable linear fit, the overall predictive correla tions were visibly lower than those achieved by 416
the multivariate TRF models, which reached medi an correlations closer to 0.18 for both HbO 417
and HbR. It is noteworthy that model performance was evaluated using cross-validation 418
rather than a trial-by-trial approach; although the latter was explored, it yielded unstable 419
predictions that did not generalize across trials, while cross-validation provides a robust and 420
unbiased estimate of predictive performance. For this reason, statistical comparisons 421
between the two are done at the subject level (n = 8). We used pairwise t-tests to see if 422
there was a significant difference in prediction correlations between the multivariate TRF 423
model and the GLM. The TRF resulted in significantly higher prediction correlations for both 424
HbO (t(7) = 3.40, p = .011) and HbR (t(7) = 2.49, p = .041). 425
Analysis of the average GLM beta weights provided further insight into how individual 426
stimulus features drove the predicted haem odynamic response within this traditional 427
framework. Because all five regressors were included simultaneously in the model, they 428
inherently competed to explain the variance in the continuous fNIRS signal. Within this 429
competitive multivariate space, the acoustic envelope exhibited the most prominent positive 430
beta weights, indicating it was the strongest unique driver of the modelled response. 431
Conversely, the envelope derivative yielded primarily negative beta weights, and the discrete 432
linguistic features (word onset, surprisal, and entropy) displayed weights heavily centred 433
around zero. This suppression of non-envelope feat ures in the GLM aligns perfectly with our 434
TRF multivariate gain analysis. In the TRF models, acoustic envelope resulted in the highest 435
prediction correlation, whereas adding the deriv ative and linguistic features yielded near-436
zero multivariate gain. 437
438
Figure 4. Comparison of TRF Models to Null Distributions and GLM. (A) Distribution of 439
prediction correlations for the multivariate TRF model using mismatched trial data for the 440
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
HbO metric. The mean prediction correlation for the actual TRF model is shown in red. (B) 441
The same for the HbR metric. The mean predicti on correlation for the actual TRF model is 442
shown in blue. (C) Comparison of prediction correlations for the TRF (purple) and GLM 443
(green) analyses for HbO and HbR. (D) Beta weights for each feature from the GLM analysis 444
for HbO and HbR. 445
4. Discussion 446
The main finding of this study is that the TRF estimation method can yield meaningful and 447
statistically significant temporal responses from fNIRS data. The estimated TRF weights 448
present temporal dynamics that are compatible with the canonical haemodynamic response, 449
with responses peaking at latencies between 3 and 6 seconds. These temporal profiles were 450
consistent across stimulus features and evaluat ion metrics. fNIRS prediction correlations 451
between actual and modelled signals were significantly above chance, with average 452
Pearson’s r values between 0.1 and 0.2 across all speech features for both HbO and HbR, 453
higher than values typically reported in EEG and comparable to those reported in MEG TRF 454
studies. Prediction performance was also higher than that obtained using a comparable GLM 455
approach. 456
A sampling frequency of 10 Hz was identified as the optimal choice for maximising 457
computational efficiency. Instead, lower sampling frequencies led to information loss. 458
Additionally, HbO and HbR haemoglobin metrics resulted in fNIRS prediction correlations 459
with comparable performance, with the HbO consistently approximately 10% higher. When 460
considering fNIRS channels separately, predict ions are higher in frontal regions than 461
temporal regions for both metrics. 462
4.1 Forward vs Backward TRF modelling 463
Backward (decoding) TRF models typically yield higher correlation values than forward 464
(encoding) TRF models in neurophysiology. One key reason is that forward models evaluate 465
the model predictions on noisy neural reco rdings, as the ground-truth neural signal is 466
unknown. Backward models, instead, perform that evaluation (i.e., Pearson’s correlation) on 467
the (clean) stimulus space, where the ground-tr uth is known. Another difference relates to 468
the multivariate-to-univariate nature of the TRF mapping. Backward models consider all 469
neural channels simultaneously, increasing the information used for the speech 470
reconstruction compared to using a single channel. Conversely, forward models predict one 471
channel at a time, but can combine multiple features simultaneously, which can increase the 472
model explanatory power when features are relat ed to complementary neural variance. With 473
these considerations in mind, backward models are generally expected to produce prediction 474
correlations that are higher numerically, whereas forward models would generally yield lower 475
correlation values, but provide greater physiological interpretability (Crosse et al., 2016; 476
Haufe et al., 2014; Wong et al., 2018). 477
In contrast with that literature, our results with fNIRS indicate comparable backward and 478
forward TRF correlations (Figure 1A). One possibility is that fNIRS signals have a higher 479
signal-to-noise (SNR), as higher SNRs increase the prediction correlation of forward models. 480
Another possibility is that speech-related fNIR S responses are spatially localised, meaning 481
that combining multiple channels in backward models would not increase the correlation 482
scores. The slow characteristic of haemodynamic signals also likely plays a role in this 483
result. In fact, such slow dynamics may be predictable using the speech envelope, while the 484
faster envelope dynamics may be too rapid for being reconstructed with such a slow brain 485
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
signal. It should also be noted that the baseline models in Figure 4A, B have mean above 486
zero, reflecting potential similarities across trials regardless of the specific speech segment 487
(e.g., auditory onset response). This further highlights the importance of considering a 488
baseline model (e.g., via trial mismatching) when assessing statistical significance. 489
While the slow dynamics of fNIRS may be seen as a disadvantage for the reasons above, 490
one positive implication is computational, as the time-series can be downsampled. Here, we 491
found that the TRF analysis remains unaltered for downsampling rates down to 10 Hz, while 492
information is lost at lower frequencies. 493
4.2 Stimulus Feature Analysis 494
The acoustic envelope yielded the highest fN IRS prediction correlations and contributed 495
most strongly to the multivariate models for both HbO and HbR (Figure 3). This was 496
expected as it is typically the case for EE G and MEG as well. An additional contributor to 497
that strong relationship may be that the envelope is the stimulus property in our feature set 498
with the closest rates to the fNIRS signal, possibly leading to a stronger alignment. In fact, 499
features with fast dynamics, such as the half-way rectified envelope derivative, which have 500
shown to be important in EEG TRF studies (B rodbeck et al., 2018; Chalas et al., 2022) had 501
little to no contribution to the multivariate TRF. 502
The TRF weights did not exhibit particular variation across features (Figure 3B). The 503
temporal dynamics for all five features resemble the canonical haemodynamic response 504
function, with peaks between 3 and 6 seconds for both HbO and HbR. This similarity across 505
features may explain why the multivariate gain was quite small. The implication is that, as 506
expected from fNIRS, the spatial maps for the TRFs may be more informative than their 507
temporal dynamics. This was seen in the univariate envelope model, with significant 508
differences in the prediction correlations acro ss different channels and visually different 509
behaviour in the weights. 510
4.3 Comparison of the TRF to null distributions and GLM approaches 511
Adoption of an appropriate baseline model or null distribution is critical for evaluating 512
encoding models, such as the TRF. Here, we used trial-mismatch to ensure disruption of the 513
temporal information in the fNIRS signal. We have also explored other baselines that are 514
common for EEG and MEG. For example, time-based shuffling methods implementing 515
within-trial permutation or circ ular shifts failed to disrupt the predictive power of the TRF 516
model, yielding surprisingly similar correlation va lues as the true fNIRS-stimulus trials, most 517
likely due to the slow nature of the neural haemodynamic response measured by fNIRS. 518
Both within-trial permutation and circular shift pr eserve aspects of the original time series 519
that are statistically meaningful for the haemodynamic response, preserving the overall 520
distribution and amplitude of the stimulus feature, with circular shift additionally preserving 521
the autocorrelation structure. Since fN IRS signals are dominated by slow, highly 522
autocorrelated haemodynamics, these shuffling methods can yield null model prediction 523
correlations that are very close to those obtained with the actual stimulus features 524
(Lancaster et al., 2018; Liégeois et al., 2021; Santosa et al., 2018). As such, we recommend 525
considering building null distribution by combining trial-mismatch with other operations, such 526
as time-reversal and circular shift, disrupting the stimulus-response coupling. Interestingly, 527
the null distribution does not centre perfectly on zero. fNIRS signals at the low frequencies 528
we included (i.e. 0.01 Hz) remain temporally smooth and strongly autocorrelated due to the 529
sluggish haemodynamic response. Since consecutive samples are not independent and low-530
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
frequency components dominate the variance, correlation estimates from fNIRS signals can 531
exhibit a non-zero baseline correlations (A fyouni et al., 2019; Arbabshirani et al., 2014; 532
Olszowy et al., 2019). 533
Prediction correlations were higher for TRFs than the GLM analysis. This difference in 534
predictive power reflects the increased degrees of freedom inherent in the TRF method. By 535
modelling temporal dynamics across a range of lags, TRFs can capture complex variance in 536
the continuous haemodynamic response that a static, canonical HRF-based GLM may 537
underfit. Just as with our TRF analysis, the acoustic envelope was the strongest contributor 538
to the overall model. The beta weights for a ll other features were near zero, with the 539
envelope derivative resulting in negative beta weights. Together, the near-zero GLM weights 540
for higher-level features (word onset, surprisal, and entropy) and their lack of added 541
predictive value in the TRF approach demonstrate that the acoustic envelope dominates the 542
trackable haemodynamic signal in this paradigm, leaving little unique variance for the other 543
variables to explain when modelled concurrently. 544
4.4 Limitations and recommendations for fNIRS TRFs 545
Since we recorded data recording during the hackathon, there were several constrains. 546
Some steps, such as mounting the optodes and placing the fNIRS cap were performed by 547
the participants themselves, who had no previous experience with fNIRS (Yücel et al., 2025). 548
These steps were supervised by two experts from Artinis, who gave step-by-step 549
demonstrations and made the necessary adjustments to ensure high quality recording, 550
which was time consuming. The environment was not as controlled as in laboratory settings. 551
For example, external noise was minimised but could not be fully avoided. The key limitation 552
was our sample size (N=8), which was too small for a thorough statistical testing across 553
participants. Nonetheless, we carried out statistical tests across trials, testing for the 554
robustness of numerical results across the different speech segments in the experiment. The 555
limited number of participants posed limitations on certain hyperscanning-oriented analyses, 556
such as wavelet transform coherence (WTC). That is because WTC evaluates coherence 557
across both frequency and time dimensions, requiring long, continuous data streams to 558
resolve slow haemodynamic fluctuations. Becaus e our experimental trials were relatively 559
short for targeting haemodynamic fluctuations (1-2 minutes) and featured distinct sentences, 560
they could not be artificially concatenated or aggregated; doing so would introduce severe 561
temporal discontinuities and edge artifacts, rendering time-resolved synchrony analyses 562
invalid. Future hyperscanning studies aiming to use WTC should ensure continuous, long-563
duration stimulus presentations, larger and better selected sample providing sufficient data 564
to improve the strength of the statistical analyses. 565
In our experiment, stimulus presentation and recording systems could not be directly 566
integrated, meaning we did not have highly prec ise triggers. The PortaSync synchroniser is 567
set up for triggers manually sent via a button press at the start of an event (i.e., a trial) which 568
can creates delays. Since the fNIRS response is much slower than EEG and MEG, these 569
delays between the actual stimulus onset and the triggers are not detrimental. Our 570
recommendation is that the insertion of triggers for events in the recording should be 571
automated (e.g. applying transistor-transistor logi c [TTL] pulses) to ensure more consistent 572
and better synchronisation. 573
We present several recommendations for future studies employing a TRF approach to fNIRS 574
data. First, we recommend padding the stimul us and fNIRS time-series by additional 30 575
seconds (raw fNIRS data for the neural data and zero-padded features for the stimulus data) 576
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
to allow for an entire haemodynamic cycle to take place after the trial ends. Longer time-lags 577
than typical EEG and MEG TRFs should be considered to properly capture this 578
haemodynamic response – up to approximately 10 seconds after feature onset. To improve 579
computation efficiency, we recommend downsampling the neural and stimulus data down to 580
10 Hz. When selecting features, we recommend using those with slow fluctuations, such as 581
the acoustic envelope over those with rapid fluctuations and sharp edges, such as the 582
envelope derivative. Finally, when identifying a null distribution, we recommend using trial 583
mismatch, with trials spaced far apart in time, to try to break the autocorrelation structure 584
within the fNIRS data and minimise inflated null correlation values. 585
4.5 Conclusions 586
Our findings show that TRF estimation can be meaningfully applied to fNIRS data, going 587
beyond typical application in electrophysiological recordings. Despite the slower 588
haemodynamic signals, we were able to show that fNIRS TRFs outperform more 589
conventional GLM based approaches. These result s support the integration of continuous 590
speech-tracking frameworks into fNIRS re search and broaden the methodological tools 591
available for naturalistic hearing and communication studies. 592
593
CRediT Author Contributions 594
Johanna Wilroth – Data curation, formal analysis, investigation, methodology, visualisation, 595
and writing – original draft, writing – review & editing. Nancy Sotero Silva – Investigation, 596
writing – original draft, writing – review & editing. Ali Tafakkor – Data curation, formal 597
analysis, investigation, methodology, visualisation, writing – original draft, writing – review & 598
editing. Bruno de Avo Mesquita – Data curation, investigation, visualisation, writing – 599
review & editing. Emily Y. J. Ip – Data curation, conceptualisation, methodology, writing – 600
review & editing. Bonnie Lau – Investigation, methodology. Jaimy Hannah – Data curation, 601
formal analysis, investigation, methodology, project administration, supervision, visualisation, 602
writing – original draft, writing – review & editing. Giovanni M. Di Liberto – 603
conceptualisation, funding acquisition, investi gation, methodology, resources, supervision, 604
writing – review & editing. 605
References
615
Afyouni, S., Smith, S. M., & Nichols, T. E. (2019). Effective degrees of freedom of the 616
Pearson's correlation coefficient under autocorrelation. NeuroImage, 199. 617
https://doi.org/10.1016/j.neuroimage.2019.05.011 618
Anderson, C. A., Wiggins, I. M., Kitterick, P . T., Hartley, D. E. H., Anderson, C. A., Wiggins, I. 619
M., Kitterick, P . T., & Hartley, D. E. H. (2017). Adaptive benefit of cross-modal 620
plasticity following cochlear implantation in deaf adults. Proceedings of the National 621
Academy of Sciences, 114(38). https://doi.org/10.1073/pnas.1704785114 622
Arbabshirani, M. R., Damaraju, E., Phlypo, R., Plis, S., Allen, E., Ma, S., Mathalon, D., 623
Preda, A., Vaidya, J. G., Adali, T., & Calhoun, V. D. (2014). Impact of autocorrelation 624
on functional connectivity. NeuroImage, 102. 625
https://doi.org/10.1016/j.neuroimage.2014.07.045 626
Bálint, A., Wimmer, W., Caversaccio, M., Rummel, C., & Weder, S. (2025). Brain activation 627
patterns in normal hearing adults: An fNIRS Study using an adapted clinical speech 628
comprehension task. Hearing Research , 455. 629
https://doi.org/10.1016/j.heares.2024.109155 630
Bell, L., Peng, Z. E., Pausch, F., Reindl, V., Neuschaefer-Rube, C., Fels, J., Konrad, K., Bell, 631
L., Peng, Z. E., Pausch, F., Reindl, V., Neuschaefer-Rube, C., Fels, J., & Konrad, K. 632
(2020). fNIRS Assessment of Speech Comprehension in Children with Normal 633
Hearing and Children with Hearing Aids in Virtual Acoustic Environments: Pilot Data 634
and Practical Recommendations. Children 2020, Vol. 7, , 7 (11). 635
https://doi.org/10.3390/children7110219 636
Bertachini, A. L. L., Januario, G. C., Novi, S. L., Mesquita, R. C., Silva, M. A. R., Andrade, G. 637
M. Q., de Resende, L. M., de Miranda, D. M., Bertachini, A. L. L., Januario, G. C., 638
Novi, S. L., Mesquita, R. C., Silva, M. A. R., Andrade, G. M. Q., de Resende, L. M., & 639
de Miranda, D. M. (2021). Hearing brain ev aluated using near-infrared spectroscopy 640
in congenital toxoplasmosis. Scientific Reports 2021 11:1 , 11 (1). 641
https://doi.org/10.1038/s41598-021-89481-0 642
Brodbeck, C., Hong, L. E., & Simon, J. Z. (2018). Rapid Transformation from Auditory to 643
Linguistic Representations of Continuous Speech. Current Biology , 28(24), 3976-644
3983.e3975. https://doi.org/10.1016/j.cub.2018.10.042 645
Carollo, A., Stella, M., Lim, M., Bizzego, A., & Esposito, G. (2025). Emotional content and 646
semantic structure of dialogues are associ ated with Interpersonal Neural Synchrony 647
in the Prefrontal Cortex. NeuroImage, 309. 648
https://doi.org/10.1016/j.neuroimage.2025.121087 649
Chalas, N., Daube, C., Kluger, D. S., Abbasi, O., Nitsch, R., & Gross, J. (2022). Multivariate 650
analysis of speech envelope tracking reveals coupling beyond auditory cortex. 651
NeuroImage, 258. https://doi.org/10.1016/j.neuroimage.2022.119395 652
Crosse, M. J., Butler, J. S., & Lalor, E. C. (2015). Congruent Visual Speech Enhances 653
Cortical Entrainment to Continuous A uditory Speech in Noise-Free Conditions. 654
Journal of Neuroscience, 35(42). https://doi.org/10.1523/JNEUROSCI.1829-15.2015 655
Crosse, M. J., Di Liberto, G. M., Bednar, A., & Lalor, E. C. (2016). The multivariate temporal 656
response function (mTRF) toolbox: A MATL AB toolbox for relating neural signals to 657
continuous stimuli. Frontiers in Human Neuroscience, 10(NOV2016), 219245-658
219245. https://doi.org/10.3389/FNHUM.2016.00604/BIBTEX 659
Crosse, M. J., Zuk, N. J., Di Liberto, G. M., Ni differ, A. R., Molholm, S., & Lalor, E. C. (2021). 660
Linear Modeling of Neurophysiological Responses to Speech and Other Continuous 661
Stimuli: Methodological Considerations for Applied Research. Frontiers in 662
Neuroscience, 15. https://doi.org/10.3389/fnins.2021.705621 663
Czeszumski, A., Eustergerling, S., Lang, A., Menrath, D., Gerstenberger, M., Schuberth, S., 664
Schreiber, F., Rendon, Z. Z., & König, P . (2020). Hyperscanning: A Valid Method to 665
Study Neural Inter-brain Underpinnings of Social Interaction. Frontiers in Human 666
Neuroscience, 14. https://doi.org/10.3389/fnhum.2020.00039 667
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
D'Errico, J. (2026). inpaint_nans . In MATLAB Central File Exchange. 668
https://uk.mathworks.com/matlabcentral/fileexchange/4551-inpaint_nans 669
Di Liberto, G. M., Attaheri, A., Cantisani, G., Reilly, R. B., Ní Choisdealbha, Á., Rocha, S., 670
Brusini, P ., & Goswami, U. (2023). Emergence of the cortical encoding of phonetic 671
features in the first year of life. Nature communications, 14(1), 7789. 672
Di Liberto, G. M., & Ip, E. Y . J. (2025). Speech Neurophysiology in Realistic Contexts: Big 673
Hype or Big Leap? https://doi.org/10.48550/arXiv.2506.05494 674
Di Liberto, G. M., Lalor, E. C., & Millman, R. E. (2018). Causal cortical dynamics of a 675
predictive enhancement of speech intelligibility. NeuroImage, 166, 247-258. 676
Di Liberto, G. M., Nidiffer, A., Crosse, M. J., Zuk, N., Haro, S., Cantisani, G., Winchester, M. 677
M., Igoe, A., McCrann, R., & Chandra, S. (2024). A standardised open science 678
framework for sharing and re-analysing neural data acquired to continuous stimuli. 679
Neurons, Behavior, Data analysis, and Theory, 1-25. 680
Di Liberto, G. M., Pelofi, C., Bianco, R., Patel, P ., Mehta, A. D., Herrero, J. L., De Cheveigné, 681
A., Shamma, S., & Mesgarani, N. (2020). Cortical encoding of melodic expectations 682
in human temporal cortex. eLife, 9, e51784. 683
Ding, N., Chatterjee, M., & Simon, J. Z. (2014). Robust cortical entrainment to the speech 684
envelope relies on the spectro-temporal fine structure. NeuroImage, 88. 685
https://doi.org/10.1016/j.neuroimage.2013.10.054 686
Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while 687
listening to competing speakers. Proceedings of the National Academy of Sciences , 688
109(29), 11854-11859. https://doi.org/10.1073/pnas.1205381109 689
Eulau, K., & Hirsh-Pasek, K. (2024). From behavioral synchrony to language and beyond. 690
Frontiers in Integrative Neuroscience, 18. https://doi.org/10.3389/fnint.2024.1488977 691
Haresign, I. M., Phillips, E. A. M., Whitehorn, M., Goupil, L., Noreika, V., Leong, V., & Wass, 692
S. V. (2022). Measuring the temporal dyna mics of inter-personal neural entrainment 693
in continuous child-adult EEG hyperscanning data. Developmental Cognitive 694
Neuroscience, 54. https://doi.org/10.1016/j.dcn.2022.101093 695
Hasson, U., Ghazanfar, A. A., Galantucci, B., Garrod, S., & Keysers, C. (2012). Brain-to-696
brain coupling: a mechanism for creating and sharing a social world. Trends in 697
Cognitive Sciences, 16(2), 114-121. https://doi.org/10.1016/j.tics.2011.12.007 698
Haufe, S., Meinecke, F., Görgen, K., Dähne, S., Haynes, J.-D., Blankertz, B., & Bießmann, F. 699
(2014). On the interpretation of weight vectors of linear models in multivariate 700
neuroimaging. NeuroImage, 87. https://doi.org/10.1016/j.neuroimage.2013.10.067 701
Hu, Y ., Liu, Y ., Hou, Y ., Pan, Y ., Gao, X., Hu, Y ., Liu, Y ., Hou, Y ., Pan, Y ., & Gao, X. (2025). 702
Dataset of natural conversations about appearance using fNIRS. Scientific Data 2025 703
12:1, 12(1). https://doi.org/10.1038/s41597-025-05574-9 704
Huppert, T. J., Diamond, S. G., Franceschini, M. A., & Boas, D. A. (2009). HomER: a review 705
of time-series analysis methods for near-i nfrared spectroscopy of the brain. Applied 706
Optics, Vol. 48, Issue 10, pp. D280-D298 , 48(10). 707
https://doi.org/10.1364/AO.48.00D280 708
Ip, E. Y . J., Akkaya, A., Winchester, M. M., Bishop, S. J., Cowan, B. R., & Di Liberto, G. M. 709
(2025). Exploring the impact of social relevance on the cortical tracking of speech: 710
viability and temporal response characterisation. bioRxiv, 2025.2009.2023.674728. 711
https://doi.org/10.1101/2025.09.23.674728 712
Lalor, E. C. P ., Alan J.; Reilly, Richard B.; Foxe, John J. . (2009). Resolving Precise Temporal 713
Processing Properties of the Auditory System Using Continuous Stimuli. Journal of 714
Neurophysiology, 102(1). https://doi.org/10.1152/jn.90896.2008 715
Lancaster, G., Iatsenko, D., Pidde, A., Ticcinelli, V., & Stefanovska, A. (2018). Surrogate data 716
for hypothesis testing of physical systems. Physics Reports , 748. 717
https://doi.org/10.1016/j.physrep.2018.06.001 718
Lawrence, R. J., Wiggins, I. M., Anderson, C. A., Davies-Thompson, J., & Hartley, D. E. H. 719
(2018). Cortical correlates of speech intelligibility measured using functional near-720
infrared spectroscopy (fNIRS). Hearing Research , 370. 721
https://doi.org/10.1016/j.heares.2018.09.005 722
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
Levin, M., Balberg, M., Zaltz, Y ., Levin, M., Balberg, M., & Zaltz, Y . (2022). Cortical Activation 723
in Response to Speech Differs between Prelingually Deafened Cochlear Implant 724
Users with Good or Poor Speech-in-Noise Understanding: An fNIRS Study. Applied 725
Sciences 2022, Vol. 12,, 12(23). https://doi.org/10.3390/app122312063 726
Liégeois, R., Yeo, B. T. T., & Ville, D. V. D. (2021). Interpreting null models of resting-state 727
functional MRI dynamics: not throwing the model out with the hypothesis. 728
NeuroImage, 243. https://doi.org/10.1016/j.neuroimage.2021.118518 729
Menn, K. H., Männel, C., & Meyer, L. (2023). Phonological acquisition depends on the timing 730
of speech sounds: Deconvolution EEG m odeling across the first five years. Science 731
Advances, 9(44), eadh2560. 732
Miezin, F. M., Maccotta, L., Ollinger, J. M., Petersen, S. E., & Buckner, R. L. (2000). 733
Characterizing the Hemodynamic Response: Effects of Presentation Rate, Sampling 734
Procedure, and the Possibility of Ordering Brain Activity Based on Relative Timing. 735
NeuroImage, 11(6). https://doi.org/10.1006/nimg.2000.0568 736
Minagawa-Kawai, Y ., Mori, K., Hebden, J. C., & Dupoux, E. (2008). Optical imaging of 737
infants' neurocognitive development: Recent advances and perspectives. 738
Developmental Neurobiology, 68(6). https://doi.org/10.1002/dneu.20618 739
O'Sullivan, J. A., Power, A. J., Mesgarani, N., Rajaram, S., Foxe, J. J., Shinn-Cunningham, 740
B. G., Slaney, M., Shamma, S. A., & Lalor, E. C. (2015). Attentional Selection in a 741
Cocktail Party Environment Can Be Decoded from Single-Trial EEG. Cerebral Cortex 742
(New York, N.Y.: 1991), 25(7), 1697-1706. https://doi.org/10.1093/cercor/bht355 743
Olszowy, W., Aston, J., Rua, C., Williams, G. B., Olszowy, W., Aston, J., Rua, C., & Williams, 744
G. B. (2019). Accurate autocorrelation modeling substantially improves fMRI 745
reliability. Nature Communications 2019 10:1 , 10(1). https://doi.org/10.1038/s41467-746
019-09230-w 747
Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., Kastman, E., 748
& Lindeløv, J. K. (2019). PsychoPy2: Experiments in behavior made easy. Behavior 749
research methods, 51, 195-203. 750
Peñ a , M., Maki, A., K ovac /i1i ć, D., De haene -Lam be r tz , G., Koizumi, H., Bo uq uet, F., Mehler, 751
J., P eñ a , M., Ma k i, A., K ovac /i1ić , D., Dehaene-Lambertz, G., Koizumi, H., Bouquet, 752
F., & Mehler, J. (2003). Sounds and silence: An optical topography study of language 753
recognition at birth. Proceedings of the National Academy of Sciences , 100(20). 754
https://doi.org/10.1073/pnas.1934290100 755
Pinti, P ., Tachtsidis, I., Hamilton, A., Hirsch, J., Aichelburg, C., Gilbert, S., & Burgess, P . W. 756
(2020). The present and future use of func tional near-infrared spectroscopy (fNIRS) 757
for cognitive neuroscience. Annals of the new York Academy of Sciences, 1464(1). 758
https://doi.org/10.1111/nyas.13948 759
Pollonini, L., Olds, C., Abaya, H., Bortfeld, H., Beauchamp, M. S., & Oghalai, J. S. (2014). 760
Auditory cortex activation to natural speech and simulated cochlear implant speech 761
measured with functional near-infrared spectroscopy. Hearing Research , 309. 762
https://doi.org/10.1016/j.heares.2013.11.007 763
Quaresima, V., Ferrari, M., & Valentina Quares ima, M. F. (2019). Functional Near-Infrared 764
Spectroscopy (fNIRS) for Assessing Cerebral Cortex Function During Human 765
Behavior in Natural/Social Situations: A Concise Review. Organizational Research 766
Methods, 22(1). https://doi.org/10.1177/1094428116658959 767
Rossi, S., Telkemeyer, S., Wartenburger, I., & Obrig, H. (2012). Shedding light on words and 768
sentences: Near-infrared spectr oscopy in language research. Brain and Language , 769
121(2). https://doi.org/10.1016/j.bandl.2011.03.008 770
Rowland, S. C., Hartley, D. E. H., Wiggins, I. M., & Stephen C. Rowland, D. E. H. H., Ian M. 771
Wiggins. (2018). Listening in Naturalistic Scenes: What Can Functional Near-Infrared 772
Spectroscopy and Intersubjec t Correlation Analysis Tell Us About the Underlying 773
Brain Activity? Trends in Hearing, 22. https://doi.org/10.1177/2331216518804116 774
Santosa, H., Zhai, X., Fishburn, F., Huppert, T., Santosa, H., Zhai, X., Fishburn, F., & 775
Huppert, T. (2018). The NIRS Brain AnalyzIR Toolbox. Algorithms 2018, Vol. 11, , 776
11(5). https://doi.org/10.3390/a11050073 777
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint
Sevy, A. B. G., Bortfeld, H., Huppert, T. J., Beauchamp, M. S., Tonini, R. E., & Oghalai, J. S. 778
(2010). Neuroimaging with near-infrared s pectroscopy demonstrates speech-evoked 779
activity in the auditory cortex of deaf children following cochlear implantation. Hearing 780
Research, 270(1-2). https://doi.org/10.1016/j.heares.2010.09.010 781
Shen, X., Tao, L., Chen, X., Song, S., Liu, Q., & Zhang, D. (2024). Contrastive learning of 782
shared spatiotemporal EEG representati ons across individuals for naturalistic 783
neuroscience. NeuroImage, 301. https://doi.org/10.1016/j.neuroimage.2024.120890 784
Sonkusare, S., Breakspear, M., & Guo, C. (2019). Naturalistic Stimuli in Neuroscience: 785
Critically Acclaimed. Trends in Cognitive Sciences , 23(8). 786
https://doi.org/10.1016/j.tics.2019.05.004 787
Tucker, S., Dubb, J., Kura, S., Lühmann, A. v., Franke, R., Horschig, J. M., Powell, S., 788
Oostenveld, R., Lührs, M., Delaire, É., Aghajan, Z. M., Yun, H., Yücel, M. A., Fang, 789
Q., Huppert, T. J., Frederick, B. d., Pollonini, L., Boas, D. A., Luke, R.,…Luke, R. 790
(2022). Introduction to the shared near infrared spectroscopy format. 791
Neurophotonics, 10(1). https://doi.org/10.1117/1.NPh.10.1.013507 792
van de Rijt, L. P . H., van Opstal, A. J., Mylanus, E. A. M., Straatman, L. V., Hu, H. Y ., Snik, A. 793
F. M., & van Wanrooij, M. M. (2016). Temporal Cortex Activation to Audiovisual 794
Speech in Normal-Hearing and Cochlear Implant Users Measured with Functional 795
Near-Infrared Spectroscopy. Frontiers in Human Neuroscience , 10. 796
https://doi.org/10.3389/fnhum.2016.00048 797
Wong, D. D. E., Fuglsang, S. A., Hjortkjær, J., Ceolini, E., Slaney, M., & de Cheveigné, A. 798
(2018). A Comparison of Regularization Methods in Forward and Backward Models 799
for Auditory Attention Decoding. Frontiers in Neuroscience , 12. 800
https://doi.org/10.3389/fnins.2018.00531 801
Yücel, M. A., Luke, R., Mesquita, R. C., von Lühmann, A., Mehler, D. M. A., Lührs, M., 802
Gemignani, J., Abdalmalak, A., Albrecht, F., de Almeida Ivo, I., Artemenko, C., 803
Ashton, K., Augustynowicz, P ., Bajracharya, A., Bannier, E., Barth, B., Bayet, L., 804
Behrendt, J., Khani, H. B.,…Zemanek, V. (2025). fNIRS reproducibility varies with 805
data quality, analysis pipelines, and researcher experience. Communications Biology 806
2025 8:1, 8(1). https://doi.org/10.1038/s42003-025-08412-1 807
808
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 23, 2026. ; https://doi.org/10.64898/2026.03.20.713212doi: bioRxiv preprint