Use of Computer Vision Analysis for Labeling Inattention Periods in Eeg Recordings With Visual Stimuli

doi:10.21203/rs.3.rs-4637470/v1

Use of Computer Vision Analysis for Labeling Inattention Periods in Eeg Recordings With Visual Stimuli

2024 · doi:10.21203/rs.3.rs-4637470/v1

preprint OA: closed

Full text JSON View at publisher

Full text 138,933 characters · extracted from preprint-html · click to expand

Use of Computer Vision Analysis for Labeling Inattention Periods in Eeg Recordings With Visual Stimuli | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Use of Computer Vision Analysis for Labeling Inattention Periods in Eeg Recordings With Visual Stimuli Dmitry Isaev, Samantha Major, Kimberly L.H. Carpenter, Jordan Grapel, and 5 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4637470/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 22 Aug, 2025 Read the published version in Scientific Reports → Version 1 posted 12 You are reading this latest preprint version Abstract Electroencephalography (EEG) recordings with visual stimuli require detailed coding to determine the periods of participant’s attention. Here we propose to use a supervised machine learning model and off-the-shelf video cameras only. We extract computer vision-based features such as head pose, gaze, and face landmarks from the video of the participant, and train the machine learning model (multi-layer perceptron) on an initial dataset, then adapt it with a small subset of data from a new participant. Using a sample size of 23 autistic children, and training on additional 2560 labeled frames (equivalent to 85.3 seconds of the video) of a new participant, the median area under the receiver operating characteristic curve for inattention detection was 0.989 (IQR 0.984–0.993) and the median inter-rater reliability (Cohen’s kappa) with a trained human annotator was 0.888. Agreement with consensus annotation on four participants labeled independently by two human annotators was in the 0.827–0.960 range. Our results demonstrate the feasibility of automatic tools to detect inattention during EEG recordings, and its potential to reduce the subjectivity and time burden of human attention coding. The tool for model adaptation and visualization of the computer vision features is made publicly available to the research community. Biological sciences/Neuroscience/Cognitive neuroscience Biological sciences/Neuroscience/Cognitive neuroscience/Attention EEG visual attention computer vision machine learning data processing automation Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Electroencephalography (EEG) is a widely used method for studying brain-behavior relations. A typical EEG recording session includes visual and/or auditory tasks, which can be presented in an event-related potential (ERP) paradigm or during spontaneous EEG recording. Collecting data using visual tasks in children is significantly more challenging due to their reduced ability to sustain their attention to visual stimuli. [1-2] The ability to sustain attention during EEG tasks can be especially challenging for children with neurodevelopmental disorders, such as autism. [3-4] A meta-analysis by Stets et al. (2012) [5] reports that studies involving visual tasks in infants have significantly higher attrition rates than auditory or combined visual and auditory tasks. While reports of attrition rates in different studies vary, [1,5-6] a general recommendation is to design tasks that will be engaging for children, thereby facilitating the maintenance of visual attention. [6] To facilitate visual attention children may be asked to provide a behavioral response (e.g., press a button, [7-8] or an experimenter may gently redirect a child to the screen when noticing signs of disengagement. [3,8-9] Removing segments of the data during which a participant did not look at the screen is often the first stage of data processing in recordings with visual stimuli. Typically, researchers either code the participant’s attention on-line by pressing a button which sends a marker to the EEG recording when the participant was not attending to the stimulus, [7,10-11] or by recording the video of the participant’s behavior synchronously with the EEG recording and marking periods of inattention post-hoc by reviewing the video. [3,12] This is a burdensome manual process requiring significant time and effort. It is also highly subjective; for example, the annotator might only see the participant’s face and must guess whether the participant’s gaze is directed to the area inside or outside of the screen. Subjectivity during this first stage of data processing poses an obstacle for EEG studies, in particular for multi-center ones, since reproducibility and constancy of EEG data quality in multi-center studies are critical. [13-14] In addition to its value for data curation, information about inattention periods can be useful for creating clinical biomarkers. There is evidence of alterations in orienting, disengagement from, and sustaining attention to relevant stimuli in autistic children. [15-18] which undoubtedly influences the amount of inattention during the EEG study. Though a typical EEG study excludes from analysis time periods where the participant is not engaged with the visual stimulus, [11,19-20] inattentiveness during EEG in social/nonsocial stimuli can be a measure that distinguishes autistic and neurotypical children, used alone or in conjunction with EEG power features. [3] Conventional eye-tracking technologies can address the problem of detecting inattention. Simultaneously presenting a stimulus on the eye-tracker screen while recording both eye-tracking and EEG signals enables the detection of a participant's visual attention directed towards the screen (see Ahtola et al. (2017) [21] for an example setup). For example, a study by Maguire et al. (2014) [22] proposed using an eye-tracker synchronized with EEG to present an “attention-getter” animation in an experiment with 6-8 year old children. They reported increased retention of EEG data compared to the condition where children were asked to provide a behavioral response (button pressing) to facilitate attention. However, eye-tracking equipment can be expensive and requires calibration. Here we propose a solution for monitoring attention during EEG acquisition based on computer vision analysis (CVA), which is scalable and less expensive than eye-tracking equipment, requiring only off-the-shelf cameras to objectively measure children’s behavior. This is largely enabled by the progress in face detection and estimation of facial landmarks, head pose, and gaze. [23-26] In non-EEG settings, these tools have been able to detect head turns in response to name, [27] and capture patterns of gaze in a low-cost setting without additional calibration. [28-29] For example, iCatcher [29] is a publicly available supervised deep learning model trained to classify infants’ gaze into three categories (‘left’, ‘right’, and ‘away’) based on facial appearance. In the work of Qian et al. (2022), [30] supervised machine learning in combination with CVA approaches were applied for the blink and head movement artifacts detection in a minimally constrained portable EEG setting. In this work, we develop a combination of CVA and a supervised machine learning model to detect inattention periods during the EEG recordings. This is computed from the videos of the child’s head and upper body captured synchronously with EEG and with simple off-the-shelf cameras. We hypothesized that automatic CVA codes of eye gaze coordinates, head pose descriptors (pitch, yaw, and roll), and nose landmarks could reliably detect periods of visual distraction from the screen using a supervised machine learning model. At the same time, we propose a minimal involvement of human annotators to fine-tune the model to a new participant. In this process, a small number of frames from the new participant’s video are labeled by a human, followed by an additional round of model training. Minor human involvement is critical since head poses and facial expressions of children vary significantly in clinical populations, justifying the need and opportunity for tuning the pre-trained model to new participants. Recent work based on iCatcher provides evidence that the lowest agreement between human annotators and automatic models occurs on the label ‘looking away from the screen. [29] We developed a graphical user interface (GUI) allowing users to label data for fine-tuning, visualize video and corresponding time series of CVA features, and post-process the model results. The post-processing stage gives an opportunity for additional quality control of inattention periods proposed by the model. The proposed approach reduces subjectivity by providing the CVA features for human reference in the labeling process, thus standardizing the information an individual uses in their labeling. It also significantly reduces the coding time by decreasing the number of frames to be labeled manually. We therefore train the model on an annotated dataset of 23 children and then adjust it to a new child by labeling a limited amount of randomly selected additional frames on the new video. We openly share online the GUI for the video and CVA features inspection, model retraining, and predictions post-processing. Methods Participants Participants were 23 children (16 males), ranging from 49–95 months of age who were part of a study funded by the National Institutes of Health (NICHD 2P50HD093074, Dawson, PI). The ethnic and racial composition of the sample was as follows: White, 17; Black, 0; Asian, 2; other and mixed race, 4; Hispanic, 4. All 23 children met DSM-5 criteria for autism spectrum disorder (ASD) based on the Autism Diagnostic Observation Schedule-2nd Edition [ 31 ] by an experienced, research reliable psychologist. Eleven of the 23 children were diagnosed with co-occurring attention deficit/hyperactivity disorder (ADHD) based on a comprehensive clinical evaluation by a clinical psychologist with expertise in ADHD. Children had a mean Full-Scale IQ of 78.5 (SD = 25.5) based on the GCA Standard Score derived from Differential Ability Scales Second Edition. [ 32 ] All caregivers/legal guardians of participants gave written, informed consent and the study protocol was approved by the Duke University Health System Institutional Review Board (Protocol numbers Pro00085435 and Pro00085156). Informed consent was obtained from the subjects and/or their legal guardian(s) for publication of identifying information/images in an online open-access publication. Methods were carried out in accordance with institutional, State, and Federal guidelines and regulations. The procedures in these studies adhere to the tenants of the Declaration of Helsinki. Additionally, the caregiver of the participant whose video was used in the Supplementary Materials, as well as blurred in the Figures, provided consent to use the materials in publication. All other data in the paper are anonymized. Recording synchronized video and EEG Continuous EEG was recorded as participants were presented with three videos involving dynamic audio-visual stimuli that included social (person gesturing and smiling), nonsocial (toys activating), and neutral (bubbles floating) content. This was followed by three event-related potential (ERP) protocols: (1) Presentation of faces and houses, (2) An auditory oddball task, and (3) Visual evoked potentials (VEP). One or two clinical research assistants were present in the room during the EEG recording to ensure the quality of the session and to gently redirect the participant’s attention back to the screen in case they were distracted. EEG data were recorded from 124 channels with reference to Cz using a Hydrocel Geodesic Sensor Net and Net Amps 400 amplifier (Electrical Geodesics, Eugene, Oregon). Data were collected using Netstation 4.5.6 with a sampling rate of 1000 Hz. The child’s face was recorded from a Basler ACE acA1300-30uc camera below the screen synchronized with the EEG. The camera resolution was 1296x966 pixels and the frame rate was 30 fps. To synchronize the camera and EEG, an in-house software code was used, based on the Basler pylon library and Cedrus StimTracker hardware device used to set markers on the EEG recording. A diagram of the recording setup is shown in Fig. 1 . [Insert Fig. 1 ] Extracting CVA features To extract the CVA features, we used in-house code involving three steps: (a) face detection and disambiguation, (b) extraction of landmarks and head pose angles, and (c) gaze estimation. The raw set of extracted features per frame included nose x (horizontal) and y (vertical) coordinates in the frame, gaze x and y coordinates in the presentation screen plane, and head pose angles (pitch, yaw, and roll). Face detection and disambiguation. Code for face detection and disambiguation used the face_recognition python library based on the dlib C + + library. [ 33 ] Every time the algorithm detected more than one face on the video (which happened either due to ambiguity of face detection – one face was detected twice, or when another person, e.g., clinician assistant entered the frame), the algorithm showed the frame with a bounding box and prompted the user to select the correct participant’s face. Extraction of landmarks and head pose angles. After the faces were detected, an algorithm for facial landmark extraction based on the intraface software library [ 26 ] was applied to the detected faces. As a result, facial landmark pixel coordinates, as well as pitch, yaw, and roll head pose angles were obtained. Gaze estimation. The iTracker software [ 24 ] was used for gaze estimation, providing gaze x and gaze y coordinates in the screen plane. Even though iTracker was trained to predict gaze coordinates on a mobile device screen for the frames captured from a mobile device frontal camera, we used the output of iTracker as a proxy for gaze coordinates in the presentation screen plane. The software package is modular and this component can be easily replaced by others as preferred by the user. Since the intraface library is not currently available to the general public, for the convenience of potential users we make publicly available an alternative processing pipeline which consists of our original face estimation and disambiguation code, and a code for landmarks, head pose and gaze extraction using the popular OpenFace software package. [ 23 ] Data attrition Due to pauses between EEG/ERP recordings where the behavior of participants was significantly different, inattention detection was restricted only to the periods during the actual recordings, and the training set for the machine learning (ML) model included only data from frames inside those periods. Frames where the face could not be detected (hence there was no information on landmarks and head pose) were excluded from the analysis as well. Data pre-processing Since inattention can happen in any direction (either when participants look to the right or left, turn the head up or down, etc.), each feature for each participant was transformed into a positive (‘plus’; Eq. (1)) and negative (‘minus’; Eq. (2)) version, $$\:featur{e}_{plus}=\text{max}\left(0,\:feature-median\:\left(feature\right)\right)$$ (1) , $$\:featur{e}_{minus}=\text{a}\text{b}\text{s}\left(\text{min}\left(0,\:feature-median\:\left(feature\right)\right)\right)$$ (2) . The final set of features for the analysis are reported in Table 1 . Table 1 List of input features per frame for the machine learning model. Feature name Feature description noseX plus Nose coordinates noseX minus noseY plus noseY minus gazeX plus Gaze coordinates gazeX minus gazeY plus gazeY minus yaw plus Head pose angles yaw minus pitch plus pitch minus roll plus roll minus [Insert Table 1 ] After pre-processing the features, the participant identifier was one-hot encoded and added to the feature list. This allowed learning a separate bias term in the first layer of the trained neural network, resembling the design of mixed models. The number of categories for one-hot encoding was one more than the number of participants, with the assumption that the identifier of the participant whose data is used for model fine-tuning and prediction is encoded in the last category. Data labeling Data for all 23 participants was labeled by one of the co-authors using the Elan v. 6.3 software. Four participants were randomly selected for independent annotation by another co-author. Neither annotator participated in data analysis. Annotators labeled data using the recorded video as ‘gaze off screen’ if the participant looked away from the screen, and/or as ‘head turn’ if the participant turned their head. For the purpose of inattention detection, a frame was labeled as ‘inattention’ if it either was labeled as a head turn or gaze off screen. Agreement on inattention labels between independent annotators was assessed with Cohen’s kappa. [ 34 ] Training and evaluating machine learning model Given the frame-by-frame pre-processed data as an input, we trained a multi-layer perceptron (MLP) model with two hidden layers (layer dimensions 512 and 14 were selected empirically), and a temperature scaling layer for model calibration. [ 35 – 36 ] The target variable was inattention label per each frame with cross-entropy as a cost function. Adam optimizer was used for model training. [ 37 ] We used weighted sampling for model training to allow each batch to have approximately equal amounts of positive and negative samples (inattention and attention respectively). Models were trained in the pytorch framework. [ 38 ] Evaluation was done using the leave-one-subject-out cross-validation (LOSO CV) method. To evaluate the model performance, we assessed average precision (AP, also known as area under precision-recall curve), area under the ROC curve (AUC), and maximal Cohen’s kappa (MK) between the human annotator and the machine learning predictions per participant across different thresholds. Additionally, we evaluated median Cohen’s kappa across the entire distribution at the range of thresholds between 0 and 1. This allowed us to assess the value of the threshold needed to achieve the best agreement between the model and the human coder over the entire distribution, without adjusting the threshold for each individual participant. Transfer learning: adjusting ML model to a new participant Our adaptation approach involved selecting a batch of 128 frames (corresponding to 4.270 s) for labeling and training for 20 epochs (full cycles over the entire labeled dataset) on newly labeled data at each iteration of additional training. To evaluate the performance of this approach, we assessed the three metrics defined in the previous section, considering both sequential (where frame features and labels are sampled into the batch sequentially from the beginning of the video, which resembles how humans would look through the dataset and label it), and random frame sampling approaches. We additionally assessed the maximum of median Cohen’s kappa across distribution, and computed the respective prediction threshold at iterations 5, 10 and 20, which correspond to 21.3, 42.6 and 85.3 additionally labeled seconds of data per participant. The exact algorithm was as follows: 1. Set N = 128 (the batch size). 2. Create empty dataset for labeled data. 3. Set Iteration = 0. 4. Predict probabilities of sample being positive in each frame. 5. If the approach is Random sampling, randomly sample N frames into the batch from the participant’s data. 6. If the approach is Sequential sampling, sample next N frames from the beginning of the participant’s data into the batch. 7. Remove frames included in the batch from the participant’s data. 8. Add batch to the labeled dataset (for training in LOSO CV framework we used the labels from the dataset for the participant the algorithm was being trained on). 9. Train for 20 epochs on the labeled dataset. 10. Compute AP, AUC, and MK. 11. Set Iteration + = 1. 12. If Iteration = = 50: Stop. 13. Go to 4. Agreement measurements between model and human and between two humans We used Cohen’s kappa as a metric of quality assessment for the human annotations. To compare the maximal median kappa value between the model and the human annotator with human agreement level, we randomly selected four participants and performed independent labeling by another annotator. Then we computed Cohen’s kappa to measure agreement between both human annotators. We additionally computed Cohen’s kappa between the model prediction on a threshold level corresponding to maximal median kappa at iteration 20 and a consensus annotation of the two human raters (in a consensus annotation the frame is labeled ‘inattention’ only if both annotators labeled it as such, otherwise frame is labeled ‘attention’). Graphical User Interface for visualizing and retraining the model We created a web-based GUI which allows for visualizing the data, labeling the data frame-by-frame and re-training the model in the random sampling framework, and post-processing of the data (see Fig. 2 for screenshot, and Supplementary Materials online for video (Supplementary Video S1) of how the tool works). The tool is based on open-source tools ‘plotly’ ( https://plotly.com/python/ ) and ‘dash’ ( https://dash.plotly.com/ ). [Insert Fig. 2 ] Results Dataset statistics The full dataset consisted of 566,043 frames. After excluding frames where the face or gaze were not detected, 535,539 frames were retained (5.38% of frames were invalid), with an average of 23,284 and a standard deviation of 6,193 frames per participant. Of all the frames, 79,629 were labeled as inattention (14.86% of the dataset). Transfer learning results The results of transfer learning can be seen in Table 2 and Fig. 3 . The sequential sampling approach performed substantially worse than the random sampling approach. Median AP, AUC and MK were 0.855, 0.965, 0.742 respectively at the start of the training (no adaptation to the participants yet). By iteration 20, median AP was 0.962, AUC 0.989, and MK 0.888 on random sampling approach as compared to median AP 0.640, AUC 0.862, and MK 0.548 in sequential sampling approach. Table 2 Average precision, AUC, and Maximal Cohen’s kappa percentiles at different iterations with two sampling/adaptation alternatives. The random sampling approach outperforms the sequential sampling one on all three metrics on each listed iteration. Average precision (percentile) AUC (percentile) Maximal Cohen's kappa (percentile) Sampling approach Iteration 50% 25% 75% 50% 25% 75% 50% 25% 75% No Fine Tuning 0 0.855 0.715 0.913 0.965 0.948 0.971 0.742 0.646 0.796 Random sampling 5 0.906 0.820 0.948 0.973 0.960 0.981 0.798 0.753 0.873 10 0.930 0.875 0.969 0.984 0.975 0.991 0.838 0.798 0.898 20 0.962 0.931 0.981 0.989 0.984 0.993 0.888 0.865 0.925 Sequential sampling 5 0.400 0.280 0.720 0.788 0.638 0.890 0.380 0.236 0.561 10 0.575 0.408 0.782 0.835 0.731 0.908 0.482 0.251 0.637 20 0.640 0.408 0.801 0.862 0.771 0.930 0.548 0.354 0.678 [Insert Fig. 3 ] [Insert Table 2 ] Cohen’s kappa analysis Cohen’s kappa at different levels of prediction threshold for both sampling approaches (random and sequential) at iterations 5, 10, and 20 are shown in Fig. 4 . Thresholds at the highest median kappa and the corresponding median kappa values are shown in Tables 2 and 3 . The highest median kappa ranges between 0.792 and 0.888 in the random sampling approach, and between 0.223 and 0.426 in the sequential one. Figure 4 shows that the median Cohen’s kappa stays relatively stable and high in the range of thresholds between 0.2 and 0.8, allowing a general threshold for the model predictions to be set in this range. Table 3 Thresholds and Cohen’s kappa levels at highest median value of kappa in the two sampling approaches at iterations 5,10,20. Sampling approach Iteration Threshold Median Cohen’s kappa Random sampling 5 0.310 0.792 10 0.484 0.838 20 0.424 0.888 Sequential sampling 5 0.004 0.223 10 0.008 0.296 20 0.020 0.426 [Insert Table 3 ] [Insert Fig. 4 ] Agreement between model and human coder and between two human coders A second independent annotator labeled videos from four participants, which in total accounts for 74,543 frames or 13.9% of the data. It took the second annotator approximately 22 hours to label the data, resulting in average of 1.06 seconds per frame. Cohen’s kappa values between the two human annotators ranged between 0.548 and 0.844 (see Table 4 ). Agreement between the model adapted by random sampling and consensus annotation increased with each iteration of additional training and was in the ranges [0.662–0.942] at iteration 5, [0.737–0.948] at iteration 10, and [0.827–0.960] at iteration 20. Table 4 Agreement level (Cohen's kappa) between human annotators, and between the models adapted by random sampling and consensus annotation at iterations 5, 10 and 20. Participant Agreement between annotators Agreement (model, consensus) – iteration 5 Agreement (model, consensus) – iteration 10 Agreement (model, consensus) – iteration 20 PT1 0.584 0.662 0.737 0.827 PT9 0.727 0.860 0.902 0.939 PT10 0.548 0.751 0.834 0.849 PT16 0.844 0.942 0.948 0.960 [Insert Table 4 ] GUI for visualizing and preprocessing pipeline We developed a web-based GUI which may be used for reviewing the CVA features of the video, additional labeling of frames and retraining the model, and post-processing of the data, including setting the model decision threshold and rejection of falsely detected inattention events. We make publicly available a pipeline for data pre-processing based on in-house code for face detection and OpenFace framework for head pose and gaze estimation. [ 23 ] Discussion In this work we proposed a method for detection of periods of inattention to visual stimuli during EEG recordings. The tool is based on the CVA of videos of participants’ behavior which were synchronously recorded with EEG. We outlined a data processing pipeline, including face and facial landmarks detection, head pose computation, and gaze estimation. We proposed a MLP model for predicting inattention from these CVA features, and random sampling as a means for fine-tuning the model for each participant. We made publicly available a GUI that allows for visualization of the CVA features, model fine-tuning, prediction thresholds adjustment, and results post-processing. The proposed random frame sampling approach for model adaptation to the participant outperforms the sequential sampling approach. For the non-fine-tuned model, maximal Cohen’s kappa was 0.742, placing the best potential agreement with the human rater in the ‘substantial’ range. [ 39 ] Compared to the initial non-fine-tuned model prediction, the model trained on additional 2560 labeled frames (equivalent to labeling only about 85 seconds of the video) significantly improved performance, as indicated by all quality metrics. On the other hand, sequential frame sampling performance decreases in the initial five iterations (see Fig. 3 ), then gradually improves, but does not reach the performance of the random sampling approach. The reasons behind this include the strong temporal correlation of the features, hence low variability in the new input data, and the rare occurrence of inattention (prevalence of inattention is 14.86%), causing the absence of positive labels in many batches. In line with a previous study, [ 29 ] we have found that agreement on inattention labeling by human coders was in the ‘moderate’ to ‘substantial’ ranges in three out of the four participants, and in the ‘perfect’ range only for one participant. [ 39 ] However, when model performance was compared to the consensus annotation between humans, the minimal agreement was already in the ‘substantial’ range after labeling 640 additional frames, and in the ‘perfect’ range in all four participants after labeling 2560 frames. As such, the proposed model tends to agree with human annotators where the human annotators agree among themselves, pointing to a more objective assessment of inattention. Labeling inattention is a challenging task for humans, likely because annotators need to make a subjective judgement regarding the boundaries of the stimulus presentation screen. The provided GUI tool allows for visualization the raw CVA features together with the participant’s video, also enabling coders to label frames for the fine-tuning or post-processing stage. When the annotator needs to make a decision on an ambiguous frame, they can play the video to compare the frame in question with neighboring frames, which may help to better evaluate whether the participant was attending to the screen. Our results show that the proposed approach can help to label data more efficiently. Given that labeling takes about 1.06 seconds per frame, the need to label only about 2560 frames for a high quality labeling can significantly reduce time and effort. Modularity of the tool we developed allows users to utilize any input/output compatible CVA pipeline and machine learning model, while keeping the same GUI. The initial model can be retrained as the amount of labeled data increase. Using the same prediction model and tool for discarding inattention periods may facilitate multi-center studies by unifying the data pre-processing pipeline. Another way to facilitate multi-center studies is to perform pre-processing and labeling of the data in each center separately, and then share only the CVA features and annotations for training of the model with larger amounts of data. Such an approach helps to preserve the privacy of the data in each center, allowing centers to share only specific de-identified CVA features. A limitation of the study is the absence of a published model and our original full pre-processing pipeline. The reason for this is the removal of the intraface library [ 26 ] from public access. We provide the code for an alternative pre-processing pipeline predicting the same features based on the publicly available OpenFace library, and the model structure and interface that needs to be implemented for it to be fully integrated into the GUI. A potential future direction is to work with the missing data caused by an inability to detect a face in the video. CVA could not detect the face in 5.38% of the frames in our dataset, likely due to either extreme angles of the head with respect to the camera or because of face occlusions. Future studies may attempt to associate these periods with attention/inattention to the screen by using imputation/interpolation methods. We presented a low-cost scalable approach to inattention detection during EEG recordings using computer vision analysis, and made a publicly available tool for visualization, model fine-tuning, and post-processing of the system’s results. We also made publicly available an example of computer vision analysis pipeline which can be used in future studies. We showed that fine-tuning the model on small amounts of new data by labeling the data on a per-frame basis substantially increases the model performance. Our work demonstrates that computer vision analysis is a feasible way of detecting inattention in EEG studies. We hope that by providing a scalable method for assessing inattention during EEG experiments, EEG studies are more reproducible, and the feasibility of studying early brain development in infants and children with and without neurodevelopmental disorders, which are populations in which sustained attention during EEG experiments can be challenging, will increase. Declarations Data Availability Due to privacy concerns, participants’ videos cannot be shared. To enable the reproducibility of the results, the dataset with extracted CVA features that were used for model training, and code for initial model training and model fine-tuning, are made publicly available at https://github.com/dyisaev/eeg-cva-model-training. A pipeline based on OpenFace software for CVA feature extraction is made publicly available at https://github.com/dyisaev/eeg-cva-feature-extraction. A GUI interface for visualization, labeling, and post-processing, together with installation and usage instructions is available at https://github.com/dyisaev/eeg-cva-visualization-tool. Python 3.9.7 was used in the model training and data analysis. Versions of python packages are listed in the corresponding repositories. Acknowledgements and Funding This research was supported by a grant from the National Institutes of Health (NIH; NICHD 2P50HD093074, Dawson, PI). We thank the NIH and the children that participated in the research studies and their families. Authors’ contributions D.Yu.I., M.Di M., D.C., K.C., G.D. and G.S. contributed to the design of the work, data analysis and interpretation; S.M. and J.G. contributed to the data acquisition and labeling; D.Yu.I. and Z.C. contributed to the creation of the new software used in the work; D.Yu.I. and S.M. contributed to drafting the first version of the manuscript; all authors revised the final manuscript. Consent for publication of video The caregiver of the participant whose video was used in the Supplementary Materials, as well as blurred in the Figures, provided consent to use the materials in publication. All other data in the paper are anonymized. Informed consent was obtained from the subjects and/or their legal guardian(s) for publication of identifying information/images in an online open-access publication. Competing interests Dr. Dawson is on the Scientific Advisory Boards of Akili, Inc., Zynerba Pharmaceutical, Inc., Nonverbal Learning Disability Project, and Tris Pharma, Inc., is a consultant to Apple, Inc., Gerson Lehrman Group, and Guidepoint Global, LLC, received speaker fees from WebMD and book royalties from Guilford Press, Oxford University Press, and Springer Nature Press. Dr. Dawson has stock interests in Neuvana, Inc. Dr. Dawson has four patents (three issued, one pending): 16678789, 1514139, 63354492, and 10912801B2. Dr. Dawson has developed technology, data, and/or products that have been licensed to Apple, Inc. and Cryocell, Inc. and Dawson and Duke University have benefited financially. Dr. Sapiro is affiliated with Apple, Inc. Dr. Carpenter has had funding by the National Institutes of Health (NIH), the Department of Defense, and the Brain and Behavior Foundation. Dr. Carpenter is a standing member on the Programmatic Panel for the Department of Defense Congressionally Directed Medical Research Programs (CDMRP) Autism Research Program and has served as an ad hoc reviewer on NIH review panels; she has received reimbursement for her time on these panels. The remaining authors declare no competing interests. References DeBoer, T., Scott, L., & Nelson, C. Methods for acquiring and analyzing infant event-related potentials in Infant EEG and Event-Related Potentials , 5–38 (Psychology Press, 2013). Thierry, G. The use of event-related potentials in the study of early cognitive development. Infant and Child Development 14(1), 85–94 (2005). https://doi.org/https://doi.org/10.1002/icd.353 Isaev, D. Y. et al.. Relative average look duration and its association with neurophysiological activity in young children with autism spectrum disorder. Scientific Reports 10(1), (2020). https://doi.org/10.1038/s41598-020-57902-1 Webb, S. J. et al.. Guidelines and best practices for electrophysiological data collection, analysis and reporting in autism. Journal of Autism and Developmental Disorders 45(2), 425–443 (2015). https://doi.org/10.1007/s10803-013-1916-6 Stets, M., Stahl, D., & Reid, V. M. A meta-analysis investigating factors underlying attrition rates in infant ERP studies. Dev. Neuropsychol. 37(3), 226–252 (2012). https://doi.org/10.1080/87565641.2012.654867 Bell, M. A., & Cuevas, K. Using EEG to study cognitive development: issues and practices. J. Cogn. Dev. 13(3), 281–294 (2012). https://doi.org/10.1080/15248372.2012.691143 Ellis, A. E., & Nelson, C. A. Category prototypicality judgments in adults and children: behavioral and electrophysiological correlates. Developmental Neuropsychology 15(2), 193–211 (1999). https://doi.org/10.1080/87565649909540745 Todd, R. M., Lewis, M. D., Meusel, L. A., & Zelazo, P. D. The time course of social-emotional processing in early childhood: ERP responses to facial affect and personal familiarity in a Go-Nogo task. Neuropsychologia 46(2), 595–613 (2008). https://doi.org/10.1016/j.neuropsychologia.2007.10.011 Murias, M. et al.. Validation of eye-tracking measures of social attention as a potential biomarker for autism clinical trials. Autism Research 11(1), 166–174 (2018). https://doi.org/10.1002/aur.1894 Dawson, G. et al.. Early behavioral intervention is associated with normalized brain activity in young children with autism. Journal of the American Academy of Child and Adolescent Psychiatry 51(11), 1150–1159 (2012). https://doi.org/10.1016/j.jaac.2012.08.018 Orekhova, E. V., Stroganova, T. A., Posikera, I. N., & Elam, M. EEG theta rhythm in infants and preschool children. Clinical Neurophysiology 117(5), 1047–1062 (2006). https://doi.org/10.1016/j.clinph.2005.12.027 Murias, M. et al.. Electrophysiological biomarkers predict clinical improvement in an open-label trial assessing efficacy of autologous umbilical cord blood for treatment of autism. Stem Cells Translational Medicine, 783–791 (2018). https://doi.org/10.1002/sctm.18-0090 Kaiser, A. et al.. EEG data quality: determinants and impact in a multicenter study of children, adolescents, and adults with attention-deficit/hyperactivity disorder (ADHD). Brain Sci. 11(2), (2021). https://doi.org/10.3390/brainsci11020214 Webb, S. J. et al.. Biomarker acquisition and quality control for multi-site studies: the autism biomarkers consortium for clinical trials [methods]. Frontiers in Integrative Neuroscience 13, (2020). https://doi.org/10.3389/fnint.2019.00071 Elsabbagh, M. et al.. Disengagement of visual attention in infancy is associated with emerging autism in toddlerhood. Biological Psychiatry 74(3), 189–194 (2013). https://doi.org/10.1016/j.biopsych.2012.11.030 Keehn, B., Müller, R. A., & Townsend, J. Atypical attentional networks and the emergence of autism. Neuroscience and Biobehavioral Reviews 37(2), 164–183 (2013). https://doi.org/10.1016/j.neubiorev.2012.11.014 McPartland, J. C., Webb, S. J., Keehn, B., & Dawson, G. Patterns of visual attention to faces and objects in autism spectrum disorder. Journal of Autism and Developmental Disorders 41(2), 148–157 (2011). https://doi.org/10.1007/s10803-010-1033-8 Werner, E., Dawson, G., Osterling, J., & Dinno, N. Recognition of autism spectrum disorder before one year of age. Journal of Autism and Developmental Disorders 30(2), 157–162 (2000). Orekhova, E. V. et al.. EEG hyper-connectivity in high-risk infants is associated with later autism. Journal of Neurodevelopmental Disorders 6(1), 1–11 (2014). https://doi.org/10.1186/1866-1955-6-40 Stroganova, T. A., V. Orekhova, E., & Posikera, I. N. Externally and internally controlled attention in infants: An EEG study. International Journal of Psychophysiology 30(3), 339–351 (1998). https://doi.org/10.1016/S0167-8760(98)00026-9 Ahtola, E., Stjerna, S., Stevenson, N., & Vanhatalo, S. Use of eye tracking improves the detection of evoked responses to complex visual stimuli during EEG in infants. Clin. Neurophysiol. Pract. 2, 81–90 (2017). https://doi.org/10.1016/j.cnp.2017.03.002 Maguire, M. J., Magnon, G., & Fitzhugh, A. E. Improving data retention in EEG research with children using child-centered eye tracking. J. Neurosci. Methods 238, 78–81 (2014). https://doi.org/10.1016/j.jneumeth.2014.09.014 Baltrusaitis, T., Zadeh, A., Lim, Y. C., & Morency, L. P. Openface 2.0: facial behavior analysis toolkit. 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) , Xi'an, China, 2018, pp. 59–66, doi: 10.1109/FG.2018.00019 Krafka, K. et al.. Eye tracking for everyone. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , Las Vegas, NV, USA, 2016, pp. 2176–2184, doi: 10.1109/CVPR.2016.239 Lugaresi, C. et al.. Mediapipe: A framework for building perception pipelines. Third Workshop on Computer Vision for AR/VR at IEEE Computer Vision and Pattern Recognition (CVPR), (2019). https://doi.org/10.48550/arXiv.1906.08172 Torre, F. D. et al.. IntraFace. 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG) , Ljubljana, Slovenia, 2015, pp. 1–8, doi: 10.1109/FG.2015.7163082 Perochon, S. et al.. A scalable computational approach to assessing response to name in toddlers with autism. Journal of Child Psychology and Psychiatry 62(9), 1120–1131 (2021). https://doi.org/https://doi.org/10.1111/jcpp.13381 Chang, Z. et al.. Computational Methods to Measure Patterns of Gaze in Toddlers With Autism Spectrum Disorder. JAMA Pediatrics 175(8), 827–836 (2021). https://doi.org/10.1001/jamapediatrics.2021.0530 Erel, Y., Potter, C. E., Jaffe-Dax, S., Lew-Williams, C., & Bermano, A. H. iCatcher: a neural network approach for automated coding of young children's eye movements. Infancy 27(4), 765–779 (2022). https://doi.org/https://doi.org/10.1111/infa.12468 Qian, X., Wang, M., Wang, X., Wang, Y., & Dai, W. Intelligent method for real-time portable EEG artifact annotation in semiconstrained environment based on computer vision. Comput Intell. Neurosci. 9590411, (2022). https://doi.org/10.1155/2022/9590411 Gotham, K. et al.. A replication of the Autism Diagnostic Observation Schedule (ADOS) revised algorithms. J. Am. Acad. Child Adolesc. Psychiatry 47(6), 642–651 (2008). https://doi.org/10.1097/CHI.0b013e31816bffb7 Elliott, C. D. Differential Ability Scales, 2nd Edition (Harcourt Assessment, 2007). King, D. E. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 10, 1755–1758 (2009). Cohen, J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960). https://doi.org/10.1177/001316446002000104 Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. On calibration of modern neural networks. 34th International Conference on Machine Learning, ICML 2017 , 3 , 2130–2143, (2017). Hastie, T., Tibshirani, R., & Friedman, J. The Elements of Statistical Learning (Springer New York Inc., 2001). Kingma, D. P., & Ba, J. Adam: A Method for Stochastic Optimization (2015) http://arxiv.org/abs/1412.6980 Paszke, A. et al.. PyTorch: an imperative style, high-performance deep learning library. Proceedings of the 33rd International Conference on Neural Information Processing Systems (pp. Article 721). Curran Associates Inc. (2019). McHugh, M. L. Interrater reliability: the kappa statistic. Biochem. Med. (Zagreb) 22(3), 276–282 (2012). Additional Declarations Competing interest reported. Dr. Dawson is on the Scientific Advisory Boards of Akili, Inc, Zynerba Pharmaceutical, Inc., Nonverbal Learning Disability Project, and Tris Pharma, Inc., is a consultant to Apple, Inc., Gerson Lehrman Group, and Guidepoint Global, LLC, received speaker fees from WebMD and book royalties from Guilford Press, Oxford University Press, Springer Nature Press. Dr. Dawson has stock interests in Neuvana, Inc. Dr. Dawson has four patents (three issued, one pending): 16678789, 1514139, 63354492, and 10912801B2. Dr. Dawson has developed technology, data, and/or products that have been licensed to Apple, Inc. and Cryocell, Inc. and Dawson and Duke University have benefited financially. Dr. Sapiro is affiliated with Apple, Inc. Dr. Carpenter has had funding by the National Institutes of Health (NIH), the Department of Defense, and the Brain and Behavior Foundation. Dr. Carpenter is a standing member on the Programmatic Panel for the Department of Defense Congressionally Directed Medical Research Programs (CDMRP) Autism Research Program and has served as an ad hoc reviewer on NIH review panels; she has received reimbursement for her time on these panels. The remaining authors declare no competing interests. Supplementary Files SupplementaryVideo1S1.zip Supplementary Video S1. Visualization of CVA features together with the video of the participant Video content is included for educational purposes and the individual’s diagnostic cohort is not specified. Cite Share Download PDF Status: Published Journal Publication published 22 Aug, 2025 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 18 Dec, 2024 Reviews received at journal 22 Nov, 2024 Reviewers agreed at journal 31 Oct, 2024 Reviewers agreed at journal 05 Sep, 2024 Reviews received at journal 29 Jul, 2024 Reviewers agreed at journal 19 Jul, 2024 Reviewers agreed at journal 19 Jul, 2024 Reviewers invited by journal 16 Jul, 2024 Editor assigned by journal 16 Jul, 2024 Editor invited by journal 12 Jul, 2024 Submission checks completed at journal 10 Jul, 2024 First submitted to journal 25 Jun, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4637470","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":335461773,"identity":"da2926f2-38a7-488d-b054-e689dda60cd6","order_by":0,"name":"Dmitry Isaev","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Dmitry","middleName":"","lastName":"Isaev","suffix":""},{"id":335461774,"identity":"fcb41b59-16e9-4ee1-b115-38bb81f90025","order_by":1,"name":"Samantha Major","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABAUlEQVRIiWNgGAWjYPCCA2DSgIHBhrGBB8RkI6QjAa4ljUQtQHCYsBb59tOJnwt/3JFnEDv8oODjnvOyG84cPsDwoewwTi0GZ3I3S89IeGbYIJ1mYDjj2W3jDWfbEhhnnMOjhSF3gzRPAtA90gkGxjwHbiduOM9jwMzbhluLfP/bzb+BWuwbpNM/ALWcA2rh/8D8F48Whhu520C2JDZI54BsOZC44WwPAzMjHi0GN95us+ZJO5zcJp1TYDjjQLLxzDPHDA72nEvH47Dczbd5bA7b9kunbzP4cMBOtu9M8sMHP8qscTsMBoARwWYA4xwgrB4CmB8Qq3IUjIJRMApGFgAAaDZfRkDt5h0AAAAASUVORK5CYII=","orcid":"","institution":"Duke University","correspondingAuthor":true,"prefix":"","firstName":"Samantha","middleName":"","lastName":"Major","suffix":""},{"id":335461775,"identity":"24fa0060-4883-4722-bf0d-6ee68cfa72f5","order_by":2,"name":"Kimberly L.H. Carpenter","email":"","orcid":"","institution":"Duke University","correspondingAuthor":false,"prefix":"","firstName":"Kimberly","middleName":"L.H.","lastName":"Carpenter","suffix":""},{"id":335461776,"identity":"f20c38b1-07a1-4486-8b41-0406ebc0d81b","order_by":3,"name":"Jordan Grapel","email":"","orcid":"","institution":"Duke University","correspondingAuthor":false,"prefix":"","firstName":"Jordan","middleName":"","lastName":"Grapel","suffix":""},{"id":335461777,"identity":"ca81daf4-438a-4f71-9f77-9f4765d98805","order_by":4,"name":"Zhuoqing Chang","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Zhuoqing","middleName":"","lastName":"Chang","suffix":""},{"id":335461778,"identity":"4226126b-1fde-46cc-8894-3c7b8a4e021b","order_by":5,"name":"Matias Di Martino","email":"","orcid":"","institution":"Duke University","correspondingAuthor":false,"prefix":"","firstName":"Matias","middleName":"Di","lastName":"Martino","suffix":""},{"id":335461779,"identity":"09594cb6-8a2f-4524-a1ac-3331f4e7235f","order_by":6,"name":"David Carlson","email":"","orcid":"","institution":"Duke University","correspondingAuthor":false,"prefix":"","firstName":"David","middleName":"","lastName":"Carlson","suffix":""},{"id":335461780,"identity":"a438feed-c35c-4dee-8c3d-35aa8079bf9e","order_by":7,"name":"Geraldine Dawson","email":"","orcid":"","institution":"Duke University","correspondingAuthor":false,"prefix":"","firstName":"Geraldine","middleName":"","lastName":"Dawson","suffix":""},{"id":335461781,"identity":"ad98161f-eef6-4881-a6a4-a82d9c0a6673","order_by":8,"name":"Guillermo Sapiro","email":"","orcid":"","institution":"Duke University","correspondingAuthor":false,"prefix":"","firstName":"Guillermo","middleName":"","lastName":"Sapiro","suffix":""}],"badges":[],"createdAt":"2024-06-25 15:18:29","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4637470/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4637470/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-10511-2","type":"published","date":"2025-08-22T16:29:19+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":62188212,"identity":"fbcf280c-3ade-42db-9cd4-ef3b4dac1b29","added_by":"auto","created_at":"2024-08-10 12:14:21","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":269939,"visible":true,"origin":"","legend":"\u003cp\u003eRecording setup. Video from the camera is recorded on Video Recording Computer, which sends a marker to the EEG Recording Computer via Cedrus Stimtracker every 100 frames. This allows for synchronization between the EEG and video recordings.\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-4637470/v1/69aec057d59ec528d6f2dd65.jpeg"},{"id":62186510,"identity":"f8471ca1-7626-4bdb-8305-86038066a7af","added_by":"auto","created_at":"2024-08-10 12:06:21","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":394562,"visible":true,"origin":"","legend":"\u003cp\u003eA: Visualization of CVA features together with the video of the participant. B: Interface for labeling the frames. Image is included for educational purposes and the individual’s diagnostic cohort is not specified.\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-4637470/v1/b3121820a0daf20fd49fe578.jpeg"},{"id":62188209,"identity":"109318de-8f67-4f51-8bb5-94722eba20fc","added_by":"auto","created_at":"2024-08-10 12:14:21","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":292845,"visible":true,"origin":"","legend":"\u003cp\u003eAverage precision, Maximal Cohen’s kappa and AUC per each iteration using different sampling/adaptation methods. Line color is median, and shaded area is interquartile range per each iteration.\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-4637470/v1/213b7857fd5e9934d195319d.jpeg"},{"id":62186512,"identity":"4dcdcdef-e8a5-4016-a0a2-b9195e833587","added_by":"auto","created_at":"2024-08-10 12:06:21","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":272208,"visible":true,"origin":"","legend":"\u003cp\u003eMedian (thick line) and Interquartile Range (shaded area) of Cohen’s kappa at different threshold levels at iterations 5, 10, and 20.\u003c/p\u003e","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-4637470/v1/86d35cf82f5e4eeab427ae84.jpeg"},{"id":89847276,"identity":"d3268346-38fa-4aa4-b789-b333740aa725","added_by":"auto","created_at":"2025-08-25 16:42:50","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2283671,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4637470/v1/c60ce0a0-ba1f-4e93-9727-51487cb2f709.pdf"},{"id":62186515,"identity":"ea889fe1-f742-4ae1-9e3b-0338b13c36e2","added_by":"auto","created_at":"2024-08-10 12:06:22","extension":"zip","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":63266519,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eSupplementary Video S1. Visualization of CVA features together with the video of the participant Video content is included for educational purposes and the individual’s diagnostic cohort is not specified.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"SupplementaryVideo1S1.zip","url":"https://assets-eu.researchsquare.com/files/rs-4637470/v1/f7a20408effb0b089c221ff1.zip"}],"financialInterests":"Competing interest reported. Dr. Dawson is on the Scientific Advisory Boards of Akili, Inc, Zynerba Pharmaceutical, Inc., Nonverbal Learning Disability Project, and Tris Pharma, Inc., is a consultant to Apple, Inc., Gerson Lehrman Group, and Guidepoint Global, LLC, received speaker fees from WebMD and book royalties from Guilford Press, Oxford University Press, Springer Nature Press. Dr. Dawson has stock interests in Neuvana, Inc. Dr. Dawson has four patents (three issued, one pending): 16678789, 1514139, 63354492, and 10912801B2. Dr. Dawson has developed technology, data, and/or products that have been licensed to Apple, Inc. and Cryocell, Inc. and Dawson and Duke University have benefited financially. Dr. Sapiro is affiliated with Apple, Inc. Dr. Carpenter has had funding by the National Institutes of Health (NIH), the Department of Defense, and the Brain and Behavior Foundation. Dr. Carpenter is a standing member on the Programmatic Panel for the Department of Defense Congressionally Directed Medical Research Programs (CDMRP) Autism Research Program and has served as an ad hoc reviewer on NIH review panels; she has received reimbursement for her time on these panels. The remaining authors declare no competing interests.","formattedTitle":"\u003cp\u003eUse of Computer Vision Analysis for Labeling Inattention Periods in Eeg Recordings With Visual Stimuli\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eElectroencephalography (EEG) is a widely used method for studying brain-behavior relations. A typical EEG recording session includes visual and/or auditory tasks, which can be presented in an event-related potential (ERP) paradigm or during spontaneous EEG recording. Collecting data using visual tasks in children is significantly more challenging due to their reduced ability to sustain their attention to visual stimuli. \u003csup\u003e[1-2]\u003c/sup\u003e The ability to sustain attention during EEG tasks can be especially challenging for children with neurodevelopmental disorders, such as autism.\u003csup\u003e\u0026nbsp;[3-4]\u003c/sup\u003e A meta-analysis by Stets et al. (2012) \u003csup\u003e[5]\u003c/sup\u003e reports that studies involving visual tasks in infants have significantly higher attrition rates than auditory or combined visual and auditory tasks. While reports of attrition rates in different studies vary, \u003csup\u003e[1,5-6]\u003c/sup\u003e a general recommendation is to design tasks that will be engaging for children, thereby facilitating the maintenance of visual attention. \u003csup\u003e[6]\u003c/sup\u003e To facilitate visual attention children may be asked to provide a behavioral response (e.g., press a button, \u003csup\u003e[7-8]\u003c/sup\u003e or an experimenter may gently redirect a child to the screen when noticing signs of disengagement. \u003csup\u003e[3,8-9]\u003c/sup\u003e\u003c/p\u003e\n\u003cp\u003eRemoving segments of the data during which a participant did not look at the screen is often the first stage of data processing in recordings with visual stimuli. Typically, researchers either code the participant\u0026rsquo;s attention on-line by pressing a button which sends a marker to the EEG recording when the participant was not attending to the stimulus, \u003csup\u003e[7,10-11]\u003c/sup\u003e or by recording the video of the participant\u0026rsquo;s behavior synchronously with the EEG recording and marking periods of inattention post-hoc by reviewing the video. \u003csup\u003e[3,12]\u003c/sup\u003e This is a burdensome manual process requiring significant time and effort. It is also highly subjective; for example, the annotator might only see the participant\u0026rsquo;s face and must guess whether the participant\u0026rsquo;s gaze is directed to the area inside or outside of the screen. Subjectivity during this first stage of data processing poses an obstacle for EEG studies, in particular for multi-center ones, since reproducibility and constancy of EEG data quality in multi-center studies are critical. \u003csup\u003e[13-14]\u003c/sup\u003e\u003c/p\u003e\n\u003cp\u003eIn addition to its value for data curation, information about inattention periods can be useful for creating clinical biomarkers. There is evidence of alterations in orienting, disengagement from, and sustaining attention to relevant stimuli in autistic children. \u003csup\u003e[15-18]\u003c/sup\u003e which undoubtedly influences the amount of inattention during the EEG study. Though a typical EEG study excludes from analysis time periods where the participant is not engaged with the visual stimulus, \u003csup\u003e[11,19-20]\u003c/sup\u003e inattentiveness during EEG in social/nonsocial stimuli can be a measure that distinguishes autistic and neurotypical children, used alone or in conjunction with EEG power features. \u003csup\u003e[3]\u003c/sup\u003e\u003c/p\u003e\n\u003cp\u003eConventional eye-tracking technologies can address the problem of detecting inattention. Simultaneously presenting a stimulus on the eye-tracker screen while recording both eye-tracking and EEG signals enables the detection of a participant\u0026apos;s visual attention directed towards the screen (see Ahtola et al. (2017) \u003csup\u003e[21]\u003c/sup\u003e for an example setup). For example, a study by Maguire et al. (2014) \u003csup\u003e[22]\u003c/sup\u003e proposed using an eye-tracker synchronized with EEG to present an \u0026ldquo;attention-getter\u0026rdquo; animation in an experiment with 6-8 year old children. They reported increased retention of EEG data compared to the condition where children were asked to provide a behavioral response (button pressing) to facilitate attention. \u0026nbsp;However, eye-tracking equipment can be expensive and requires calibration.\u003c/p\u003e\n\u003cp\u003eHere we propose a solution for monitoring attention during EEG acquisition based on computer vision analysis (CVA), which is scalable and less expensive than eye-tracking equipment, requiring only off-the-shelf cameras to objectively measure children\u0026rsquo;s behavior. This is largely enabled by the progress in face detection and estimation of facial landmarks, head pose, and gaze. \u003csup\u003e[23-26]\u003c/sup\u003e In non-EEG settings, these tools have been able to detect head turns in response to name, \u003csup\u003e[27]\u003c/sup\u003e and capture patterns of gaze in a low-cost setting without additional calibration. \u003csup\u003e[28-29]\u003c/sup\u003e For example, iCatcher \u003csup\u003e[29]\u003c/sup\u003e is a publicly available supervised deep learning model trained to classify infants\u0026rsquo; gaze into three categories (\u0026lsquo;left\u0026rsquo;, \u0026lsquo;right\u0026rsquo;, and \u0026lsquo;away\u0026rsquo;) based on facial appearance. In the work of Qian et al. (2022), \u003csup\u003e[30]\u003c/sup\u003e supervised machine learning in combination with CVA approaches were applied for the blink and head movement artifacts detection in a minimally constrained portable EEG setting.\u003c/p\u003e\n\u003cp\u003eIn this work, we develop a combination of CVA and a supervised machine learning model to detect inattention periods during the EEG recordings. This is computed from the videos of the child\u0026rsquo;s head and upper body captured synchronously with EEG and with simple off-the-shelf cameras. We hypothesized that automatic CVA codes of eye gaze coordinates, head pose descriptors (pitch, yaw, and roll), and nose landmarks could reliably detect periods of visual distraction from the screen using a supervised machine learning model. At the same time, we propose a minimal involvement of human annotators to fine-tune the model to a new participant. In this process, a small number of frames from the new participant\u0026rsquo;s video are labeled by a human, followed by an additional round of model training. Minor human involvement is critical since head poses and facial expressions of children vary significantly in clinical populations, justifying the need and opportunity for tuning the pre-trained model to new participants. Recent work based on iCatcher provides evidence that the lowest agreement between human annotators and automatic models\u0026nbsp;occurs on the label \u0026lsquo;looking away from the screen. \u003csup\u003e[29]\u003c/sup\u003e We developed a graphical user interface (GUI) allowing users to label data for fine-tuning, visualize video and corresponding time series of CVA features, and post-process the model results. The post-processing stage gives an opportunity for additional quality control of inattention periods proposed by the model. The proposed approach reduces subjectivity by providing the CVA features for human reference in the labeling process, thus standardizing the information an individual uses in their labeling. It also significantly reduces the coding time by decreasing the number of frames to be labeled manually. We therefore train the model on an annotated dataset of 23 children and then adjust it to a new child by labeling a limited amount of randomly selected additional frames on the new video. We openly share online the GUI for the video and CVA features inspection, model retraining, and predictions post-processing.\u003c/p\u003e"},{"header":"Methods","content":"\u003ch2\u003eParticipants\u003c/h2\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eParticipants were 23 children (16 males), ranging from 49\u0026ndash;95 months of age who were part of a study funded by the National Institutes of Health (NICHD 2P50HD093074, Dawson, PI). The ethnic and racial composition of the sample was as follows: White, 17; Black, 0; Asian, 2; other and mixed race, 4; Hispanic, 4. All 23 children met DSM-5 criteria for autism spectrum disorder (ASD) based on the Autism Diagnostic Observation Schedule-2nd Edition \u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e]\u003c/sup\u003e by an experienced, research reliable psychologist. Eleven of the 23 children were diagnosed with co-occurring attention deficit/hyperactivity disorder (ADHD) based on a comprehensive clinical evaluation by a clinical psychologist with expertise in ADHD. Children had a mean Full-Scale IQ of 78.5 (SD\u0026thinsp;=\u0026thinsp;25.5) based on the GCA Standard Score derived from Differential Ability Scales Second Edition. \u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e32\u003c/span\u003e]\u003c/sup\u003e\u003c/p\u003e\n \u003cp\u003eAll caregivers/legal guardians of participants gave written, informed consent and the study protocol was approved by the Duke University Health System Institutional Review Board (Protocol numbers Pro00085435 and Pro00085156). Informed consent was obtained from the subjects and/or their legal guardian(s) for publication of identifying information/images in an online open-access publication. Methods were carried out in accordance with institutional, State, and Federal guidelines and regulations. The procedures in these studies adhere to the tenants of the Declaration of Helsinki. Additionally, the caregiver of the participant whose video was used in the Supplementary Materials, as well as blurred in the Figures, provided consent to use the materials in publication. All other data in the paper are anonymized.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003eRecording synchronized video and EEG\u003c/h2\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eContinuous EEG was recorded as participants were presented with three videos involving dynamic audio-visual stimuli that included social (person gesturing and smiling), nonsocial (toys activating), and neutral (bubbles floating) content. This was followed by three event-related potential (ERP) protocols: (1) Presentation of faces and houses, (2) An auditory oddball task, and (3) Visual evoked potentials (VEP). One or two clinical research assistants were present in the room during the EEG recording to ensure the quality of the session and to gently redirect the participant\u0026rsquo;s attention back to the screen in case they were distracted. EEG data were recorded from 124 channels with reference to Cz using a Hydrocel Geodesic Sensor Net and Net Amps 400 amplifier (Electrical Geodesics, Eugene, Oregon). Data were collected using Netstation 4.5.6 with a sampling rate of 1000 Hz. The child\u0026rsquo;s face was recorded from a Basler ACE acA1300-30uc camera below the screen synchronized with the EEG. The camera resolution was 1296x966 pixels and the frame rate was 30 fps. To synchronize the camera and EEG, an in-house software code was used, based on the Basler \u003cem\u003epylon\u003c/em\u003e library and Cedrus StimTracker hardware device used to set markers on the EEG recording. A diagram of the recording setup is shown in Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\n \u003c/div\u003e\n \u003cp\u003e[Insert Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e]\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n \u003ch2\u003eExtracting CVA features\u003c/h2\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eTo extract the CVA features, we used in-house code involving three steps: (a) face detection and disambiguation, (b) extraction of landmarks and head pose angles, and (c) gaze estimation. The raw set of extracted features per frame included nose \u003cem\u003ex\u003c/em\u003e (horizontal) and \u003cem\u003ey\u003c/em\u003e (vertical) coordinates in the frame, gaze \u003cem\u003ex\u003c/em\u003e and \u003cem\u003ey\u003c/em\u003e coordinates in the presentation screen plane, and head pose angles (pitch, yaw, and roll).\u003c/p\u003e\n \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eFace detection and disambiguation.\u003c/span\u003e Code for face detection and disambiguation used the \u003cem\u003eface_recognition\u003c/em\u003e python library based on the \u003cem\u003edlib\u003c/em\u003e C\u0026thinsp;+\u0026thinsp;+\u0026thinsp;library. \u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e33\u003c/span\u003e]\u003c/sup\u003e Every time the algorithm detected more than one face on the video (which happened either due to ambiguity of face detection \u0026ndash; one face was detected twice, or when another person, e.g., clinician assistant entered the frame), the algorithm showed the frame with a bounding box and prompted the user to select the correct participant\u0026rsquo;s face.\u003c/p\u003e\n \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eExtraction of landmarks and head pose angles.\u003c/span\u003e After the faces were detected, an algorithm for facial landmark extraction based on the \u003cem\u003eintraface\u003c/em\u003e software library \u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e]\u003c/sup\u003e was applied to the detected faces. As a result, facial landmark pixel coordinates, as well as pitch, yaw, and roll head pose angles were obtained.\u003c/p\u003e\n \u003cp\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eGaze estimation.\u003c/span\u003e The iTracker software \u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e]\u003c/sup\u003e was used for gaze estimation, providing gaze \u003cem\u003ex\u003c/em\u003e and gaze \u003cem\u003ey\u003c/em\u003e coordinates in the screen plane. Even though iTracker was trained to predict gaze coordinates on a mobile device screen for the frames captured from a mobile device frontal camera, we used the output of iTracker as a proxy for gaze coordinates in the presentation screen plane. The software package is modular and this component can be easily replaced by others as preferred by the user.\u003c/p\u003e\n \u003cp\u003eSince the \u003cem\u003eintraface\u003c/em\u003e library is not currently available to the general public, for the convenience of potential users we make publicly available an alternative processing pipeline which consists of our original face estimation and disambiguation code, and a code for landmarks, head pose and gaze extraction using the popular OpenFace software package. \u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e]\u003c/sup\u003e\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\n \u003ch2\u003eData attrition\u003c/h2\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eDue to pauses between EEG/ERP recordings where the behavior of participants was significantly different, inattention detection was restricted only to the periods during the actual recordings, and the training set for the machine learning (ML) model included only data from frames inside those periods. Frames where the face could not be detected (hence there was no information on landmarks and head pose) were excluded from the analysis as well.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\n \u003ch2\u003eData pre-processing\u003c/h2\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eSince inattention can happen in any direction (either when participants look to the right or left, turn the head up or down, etc.), each feature for each participant was transformed into a positive (\u0026lsquo;plus\u0026rsquo;; Eq.\u0026nbsp;(1)) and negative (\u0026lsquo;minus\u0026rsquo;; Eq.\u0026nbsp;(2)) version,\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Equa\" class=\"Equation\"\u003e\n \u003cdiv class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e$$\\:featur{e}_{plus}=\\text{max}\\left(0,\\:feature-median\\:\\left(feature\\right)\\right)$$\u003c/div\u003e\n \u003c/div\u003e(1) ,\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\n \u003cdiv class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e$$\\:featur{e}_{minus}=\\text{a}\\text{b}\\text{s}\\left(\\text{min}\\left(0,\\:feature-median\\:\\left(feature\\right)\\right)\\right)$$\u003c/div\u003e\n \u003c/div\u003e(2) .\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eThe final set of features for the analysis are reported in Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\n \u003c/div\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eList of input features per frame for the machine learning model.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFeature name\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFeature description\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003enoseX\u003csub\u003eplus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eNose coordinates\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003enoseX\u003csub\u003eminus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003enoseY\u003csub\u003eplus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003enoseY\u003csub\u003eminus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003egazeX\u003csub\u003eplus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" rowspan=\"4\"\u003e\n \u003cp\u003eGaze coordinates\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003egazeX\u003csub\u003eminus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003egazeY\u003csub\u003eplus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003egazeY\u003csub\u003eminus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eyaw\u003csub\u003eplus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" rowspan=\"6\"\u003e\n \u003cp\u003eHead pose angles\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eyaw\u003csub\u003eminus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003epitch\u003csub\u003eplus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003epitch\u003csub\u003eminus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eroll\u003csub\u003eplus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eroll\u003csub\u003eminus\u003c/sub\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n \u003cp\u003e[Insert Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e]\u003c/p\u003e\n \u003cp\u003eAfter pre-processing the features, the participant identifier was one-hot encoded and added to the feature list. This allowed learning a separate bias term in the first layer of the trained neural network, resembling the design of mixed models. The number of categories for one-hot encoding was one more than the number of participants, with the assumption that the identifier of the participant whose data is used for model fine-tuning and prediction is encoded in the last category.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n \u003ch2\u003eData labeling\u003c/h2\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eData for all 23 participants was labeled by one of the co-authors using the Elan v. 6.3 software. Four participants were randomly selected for independent annotation by another co-author. Neither annotator participated in data analysis. Annotators labeled data using the recorded video as \u0026lsquo;gaze off screen\u0026rsquo; if the participant looked away from the screen, and/or as \u0026lsquo;head turn\u0026rsquo; if the participant turned their head. For the purpose of inattention detection, a frame was labeled as \u0026lsquo;inattention\u0026rsquo; if it either was labeled as a head turn or gaze off screen. Agreement on inattention labels between independent annotators was assessed with Cohen\u0026rsquo;s kappa. \u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e]\u003c/sup\u003e\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n \u003ch2\u003eTraining and evaluating machine learning model\u003c/h2\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eGiven the frame-by-frame pre-processed data as an input, we trained a multi-layer perceptron (MLP) model with two hidden layers (layer dimensions 512 and 14 were selected empirically), and a temperature scaling layer for model calibration. \u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e\u0026ndash;\u003cspan class=\"CitationRef\"\u003e36\u003c/span\u003e]\u003c/sup\u003e The target variable was inattention label per each frame with cross-entropy as a cost function. Adam optimizer was used for model training. \u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e37\u003c/span\u003e]\u003c/sup\u003e We used weighted sampling for model training to allow each batch to have approximately equal amounts of positive and negative samples (inattention and attention respectively). Models were trained in the \u003cem\u003epytorch\u003c/em\u003e framework. \u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e38\u003c/span\u003e]\u003c/sup\u003e Evaluation was done using the leave-one-subject-out cross-validation (LOSO CV) method. To evaluate the model performance, we assessed average precision (AP, also known as area under precision-recall curve), area under the ROC curve (AUC), and maximal Cohen\u0026rsquo;s kappa (MK) between the human annotator and the machine learning predictions per participant across different thresholds. Additionally, we evaluated median Cohen\u0026rsquo;s kappa across the entire distribution at the range of thresholds between 0 and 1. This allowed us to assess the value of the threshold needed to achieve the best agreement between the model and the human coder over the entire distribution, without adjusting the threshold for each individual participant.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\n \u003ch2\u003eTransfer learning: adjusting ML model to a new participant\u003c/h2\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eOur adaptation approach involved selecting a batch of 128 frames (corresponding to 4.270 s) for labeling and training for 20 epochs (full cycles over the entire labeled dataset) on newly labeled data at each iteration of additional training. To evaluate the performance of this approach, we assessed the three metrics defined in the previous section, considering both sequential (where frame features and labels are sampled into the batch sequentially from the beginning of the video, which resembles how humans would look through the dataset and label it), and random frame sampling approaches. We additionally assessed the maximum of median Cohen\u0026rsquo;s kappa across distribution, and computed the respective prediction threshold at iterations 5, 10 and 20, which correspond to 21.3, 42.6 and 85.3 additionally labeled seconds of data per participant. The exact algorithm was as follows:\u003c/p\u003e\n \u003c/div\u003e\u003cspan\u003e1. Set\u0026nbsp;\u003cem\u003eN\u003c/em\u003e\u0026thinsp;=\u0026thinsp;128 (the batch size).\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e2. Create empty dataset for labeled data.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e3. Set\u0026nbsp;\u003cem\u003eIteration\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e4. Predict probabilities of sample being positive in each frame.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e5. If the approach is Random sampling, randomly sample N frames into the batch from the participant\u0026rsquo;s data.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e6. If the approach is Sequential sampling, sample next\u0026nbsp;\u003cem\u003eN\u003c/em\u003e frames from the beginning of the participant\u0026rsquo;s data into the batch.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e7. Remove frames included in the batch from the participant\u0026rsquo;s data.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e8. Add batch to the labeled dataset (for training in LOSO CV framework we used the labels from the dataset for the participant the algorithm was being trained on).\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e9. Train for 20 epochs on the labeled dataset.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e10. Compute AP, AUC, and MK.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e11. Set\u0026nbsp;\u003cem\u003eIteration\u003c/em\u003e\u0026thinsp;+\u0026thinsp;=\u0026thinsp;1.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e12. If\u0026nbsp;\u003cem\u003eIteration\u003c/em\u003e\u0026thinsp;=\u0026thinsp;=\u0026thinsp;50: Stop.\u003cbr\u003e\u003c/span\u003e\u003cspan\u003e13. Go to 4.\u003cbr\u003e\u003c/span\u003e\n\u003c/div\u003e\n\u003ch3\u003eAgreement measurements between model and human and between two humans\u003c/h3\u003e\n\u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eWe used Cohen\u0026rsquo;s kappa as a metric of quality assessment for the human annotations. To compare the maximal median kappa value between the model and the human annotator with human agreement level, we randomly selected four participants and performed independent labeling by another annotator. Then we computed Cohen\u0026rsquo;s kappa to measure agreement between both human annotators. We additionally computed Cohen\u0026rsquo;s kappa between the model prediction on a threshold level corresponding to maximal median kappa at iteration 20 and a consensus annotation of the two human raters (in a consensus annotation the frame is labeled \u0026lsquo;inattention\u0026rsquo; only if both annotators labeled it as such, otherwise frame is labeled \u0026lsquo;attention\u0026rsquo;).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n \u003ch2\u003eGraphical User Interface for visualizing and retraining the model\u003c/h2\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eWe created a web-based GUI which allows for visualizing the data, labeling the data frame-by-frame and re-training the model in the random sampling framework, and post-processing of the data (see Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e for screenshot, and Supplementary Materials online for video (Supplementary Video S1) of how the tool works). The tool is based on open-source tools \u0026lsquo;plotly\u0026rsquo; (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://plotly.com/python/\u003c/span\u003e\u003c/span\u003e) and \u0026lsquo;dash\u0026rsquo; (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://dash.plotly.com/\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e\n \u003c/div\u003e\n \u003cp\u003e[Insert Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e]\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003eDataset statistics\u003c/h2\u003e\n \u003cp\u003eThe full dataset consisted of 566,043 frames. After excluding frames where the face or gaze were not detected, 535,539 frames were retained (5.38% of frames were invalid), with an average of 23,284 and a standard deviation of 6,193 frames per participant. Of all the frames, 79,629 were labeled as inattention (14.86% of the dataset).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n \u003ch2\u003eTransfer learning results\u003c/h2\u003e\n \u003cp\u003eThe results of transfer learning can be seen in Table \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e and Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e. The sequential sampling approach performed substantially worse than the random sampling approach. Median AP, AUC and MK were 0.855, 0.965, 0.742 respectively at the start of the training (no adaptation to the participants yet). By iteration 20, median AP was 0.962, AUC 0.989, and MK 0.888 on random sampling approach as compared to median AP 0.640, AUC 0.862, and MK 0.548 in sequential sampling approach.\u003c/p\u003e\n \u003cp\u003e\u003c/p\u003e\u0026nbsp;\u003ctable id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eAverage precision, AUC, and Maximal Cohen\u0026rsquo;s kappa percentiles at different iterations with two sampling/adaptation alternatives. The random sampling approach outperforms the sequential sampling one on all three metrics on each listed iteration.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\" colspan=\"3\"\u003e\n \u003cp\u003eAverage precision\u003c/p\u003e\n \u003cp\u003e(percentile)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"3\"\u003e\n \u003cp\u003eAUC\u003c/p\u003e\n \u003cp\u003e(percentile)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"3\"\u003e\n \u003cp\u003eMaximal Cohen\u0026apos;s kappa\u003c/p\u003e\n \u003cp\u003e(percentile)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSampling approach\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eIteration\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e50%\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e25%\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e75%\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e50%\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e25%\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e75%\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e50%\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e25%\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e75%\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo Fine Tuning\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.855\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.715\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.913\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.965\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.948\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.971\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.742\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.646\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.796\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRandom sampling\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e5\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.906\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.820\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.948\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.973\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.960\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.981\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.798\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.753\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.873\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e10\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.930\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.875\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.969\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.984\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.975\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.991\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.838\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.798\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.898\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e20\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.962\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.931\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.981\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.989\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.984\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.993\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.888\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.865\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.925\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSequential sampling\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e5\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.400\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.280\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.720\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.788\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.638\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.890\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.380\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.236\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.561\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e10\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.575\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.408\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.782\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.835\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.731\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.908\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.482\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.251\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.637\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e20\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.640\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.408\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.801\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.862\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.771\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.930\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.548\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.354\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.678\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003e[Insert Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e]\u003c/p\u003e\n \u003cp\u003e[Insert Table \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e]\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\n \u003ch2\u003eCohen\u0026rsquo;s kappa analysis\u003c/h2\u003e\n \u003cp\u003eCohen\u0026rsquo;s kappa at different levels of prediction threshold for both sampling approaches (random and sequential) at iterations 5, 10, and 20 are shown in Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e. Thresholds at the highest median kappa and the corresponding median kappa values are shown in Tables \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e and \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e. The highest median kappa ranges between 0.792 and 0.888 in the random sampling approach, and between 0.223 and 0.426 in the sequential one. Figure \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e shows that the median Cohen\u0026rsquo;s kappa stays relatively stable and high in the range of thresholds between 0.2 and 0.8, allowing a general threshold for the model predictions to be set in this range.\u003c/p\u003e\n \u003cp\u003e\u003c/p\u003e\u0026nbsp;\u003ctable id=\"Tab3\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eThresholds and Cohen\u0026rsquo;s kappa levels at highest median value of kappa in the two sampling approaches at iterations 5,10,20.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSampling approach\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eIteration\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eThreshold\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMedian Cohen\u0026rsquo;s kappa\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"3\"\u003e\n \u003cp\u003eRandom sampling\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.310\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.792\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.484\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.838\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.424\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.888\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" rowspan=\"3\"\u003e\n \u003cp\u003eSequential sampling\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.004\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.223\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.008\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.296\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.020\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.426\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003cp\u003e[Insert Table \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e]\u003c/p\u003e\n \u003cp\u003e[Insert Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e]\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\n \u003ch2\u003eAgreement between model and human coder and between two human coders\u003c/h2\u003e\n \u003cp\u003eA second independent annotator labeled videos from four participants, which in total accounts for 74,543 frames or 13.9% of the data. It took the second annotator approximately 22 hours to label the data, resulting in average of 1.06 seconds per frame. Cohen\u0026rsquo;s kappa values between the two human annotators ranged between 0.548 and 0.844 (see Table \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e). Agreement between the model adapted by random sampling and consensus annotation increased with each iteration of additional training and was in the ranges [0.662\u0026ndash;0.942] at iteration 5, [0.737\u0026ndash;0.948] at iteration 10, and [0.827\u0026ndash;0.960] at iteration 20.\u0026nbsp;\u003c/p\u003e\u0026nbsp;\u003ctable id=\"Tab4\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eAgreement level (Cohen\u0026apos;s kappa) between human annotators, and between the models adapted by random sampling and consensus annotation at iterations 5, 10 and 20.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eParticipant\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAgreement between annotators\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAgreement\u003c/p\u003e\n \u003cp\u003e(model, consensus)\u003c/p\u003e\n \u003cp\u003e\u0026ndash; iteration 5\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAgreement\u003c/p\u003e\n \u003cp\u003e(model, consensus)\u003c/p\u003e\n \u003cp\u003e\u0026ndash; iteration 10\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAgreement\u003c/p\u003e\n \u003cp\u003e(model, consensus)\u003c/p\u003e\n \u003cp\u003e\u0026ndash; iteration 20\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePT1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.584\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.662\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.737\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.827\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePT9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.727\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.860\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.902\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.939\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePT10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.548\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.751\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.834\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.849\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePT16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.844\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.942\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.948\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.960\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003e[Insert Table \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e]\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\n \u003ch2\u003eGUI for visualizing and preprocessing pipeline\u003c/h2\u003e\n \u003cdiv class=\"BlockQuote\"\u003e\n \u003cp\u003eWe developed a web-based GUI which may be used for reviewing the CVA features of the video, additional labeling of frames and retraining the model, and post-processing of the data, including setting the model decision threshold and rejection of falsely detected inattention events. We make publicly available a pipeline for data pre-processing based on in-house code for face detection and OpenFace framework for head pose and gaze estimation. \u003csup\u003e[\u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e]\u003c/sup\u003e\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eIn this work we proposed a method for detection of periods of inattention to visual stimuli during EEG recordings. The tool is based on the CVA of videos of participants\u0026rsquo; behavior which were synchronously recorded with EEG. We outlined a data processing pipeline, including face and facial landmarks detection, head pose computation, and gaze estimation. We proposed a MLP model for predicting inattention from these CVA features, and random sampling as a means for fine-tuning the model for each participant. We made publicly available a GUI that allows for visualization of the CVA features, model fine-tuning, prediction thresholds adjustment, and results post-processing.\u003c/p\u003e \u003cp\u003eThe proposed random frame sampling approach for model adaptation to the participant outperforms the sequential sampling approach. For the non-fine-tuned model, maximal Cohen\u0026rsquo;s kappa was 0.742, placing the best potential agreement with the human rater in the \u0026lsquo;substantial\u0026rsquo; range. \u003csup\u003e[\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]\u003c/sup\u003e Compared to the initial non-fine-tuned model prediction, the model trained on additional 2560 labeled frames (equivalent to labeling only about 85 seconds of the video) significantly improved performance, as indicated by all quality metrics. On the other hand, sequential frame sampling performance decreases in the initial five iterations (see Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e), then gradually improves, but does not reach the performance of the random sampling approach. The reasons behind this include the strong temporal correlation of the features, hence low variability in the new input data, and the rare occurrence of inattention (prevalence of inattention is 14.86%), causing the absence of positive labels in many batches.\u003c/p\u003e \u003cp\u003eIn line with a previous study, \u003csup\u003e[\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]\u003c/sup\u003e we have found that agreement on inattention labeling by human coders was in the \u0026lsquo;moderate\u0026rsquo; to \u0026lsquo;substantial\u0026rsquo; ranges in three out of the four participants, and in the \u0026lsquo;perfect\u0026rsquo; range only for one participant. \u003csup\u003e[\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]\u003c/sup\u003e However, when model performance was compared to the consensus annotation between humans, the minimal agreement was already in the \u0026lsquo;substantial\u0026rsquo; range after labeling 640 additional frames, and in the \u0026lsquo;perfect\u0026rsquo; range in all four participants after labeling 2560 frames. As such, the proposed model tends to agree with human annotators where the human annotators agree among themselves, pointing to a more objective assessment of inattention.\u003c/p\u003e \u003cp\u003eLabeling inattention is a challenging task for humans, likely because annotators need to make a subjective judgement regarding the boundaries of the stimulus presentation screen. The provided GUI tool allows for visualization the raw CVA features together with the participant\u0026rsquo;s video, also enabling coders to label frames for the fine-tuning or post-processing stage. When the annotator needs to make a decision on an ambiguous frame, they can play the video to compare the frame in question with neighboring frames, which may help to better evaluate whether the participant was attending to the screen.\u003c/p\u003e \u003cp\u003eOur results show that the proposed approach can help to label data more efficiently. Given that labeling takes about 1.06 seconds per frame, the need to label only about 2560 frames for a high quality labeling can significantly reduce time and effort.\u003c/p\u003e \u003cp\u003eModularity of the tool we developed allows users to utilize any input/output compatible CVA pipeline and machine learning model, while keeping the same GUI. The initial model can be retrained as the amount of labeled data increase.\u003c/p\u003e \u003cp\u003eUsing the same prediction model and tool for discarding inattention periods may facilitate multi-center studies by unifying the data pre-processing pipeline. Another way to facilitate multi-center studies is to perform pre-processing and labeling of the data in each center separately, and then share only the CVA features and annotations for training of the model with larger amounts of data. Such an approach helps to preserve the privacy of the data in each center, allowing centers to share only specific de-identified CVA features.\u003c/p\u003e \u003cp\u003eA limitation of the study is the absence of a published model and our original full pre-processing pipeline. The reason for this is the removal of the \u003cem\u003eintraface\u003c/em\u003e library \u003csup\u003e[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]\u003c/sup\u003e from public access. We provide the code for an alternative pre-processing pipeline predicting the same features based on the publicly available OpenFace library, and the model structure and interface that needs to be implemented for it to be fully integrated into the GUI.\u003c/p\u003e \u003cp\u003eA potential future direction is to work with the missing data caused by an inability to detect a face in the video. CVA could not detect the face in 5.38% of the frames in our dataset, likely due to either extreme angles of the head with respect to the camera or because of face occlusions. Future studies may attempt to associate these periods with attention/inattention to the screen by using imputation/interpolation methods.\u003c/p\u003e \u003cp\u003eWe presented a low-cost scalable approach to inattention detection during EEG recordings using computer vision analysis, and made a publicly available tool for visualization, model fine-tuning, and post-processing of the system\u0026rsquo;s results. We also made publicly available an example of computer vision analysis pipeline which can be used in future studies. We showed that fine-tuning the model on small amounts of new data by labeling the data on a per-frame basis substantially increases the model performance. Our work demonstrates that computer vision analysis is a feasible way of detecting inattention in EEG studies. We hope that by providing a scalable method for assessing inattention during EEG experiments, EEG studies are more reproducible, and the feasibility of studying early brain development in infants and children with and without neurodevelopmental disorders, which are populations in which sustained attention during EEG experiments can be challenging, will increase.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eData Availability\u003c/p\u003e\n\u003cp\u003eDue to privacy concerns, participants\u0026rsquo; videos cannot be shared. To enable the reproducibility of the results, the dataset with extracted CVA features that were used for model training, and code for initial model training and model fine-tuning, are made publicly available at https://github.com/dyisaev/eeg-cva-model-training. A pipeline based on OpenFace software for CVA feature extraction is made publicly available at https://github.com/dyisaev/eeg-cva-feature-extraction. A GUI interface for visualization, labeling, and post-processing, together with installation and usage instructions is available at https://github.com/dyisaev/eeg-cva-visualization-tool. Python 3.9.7 was used in the model training and data analysis. Versions of python packages are listed in the corresponding repositories.\u003c/p\u003e\n\u003cp\u003eAcknowledgements and Funding\u003c/p\u003e\n\u003cp\u003eThis research was supported by a grant from the National Institutes of Health (NIH; NICHD 2P50HD093074, Dawson, PI). We thank the NIH and the children that participated in the research studies and their families.\u003c/p\u003e\n\u003cp\u003eAuthors\u0026rsquo; contributions\u003c/p\u003e\n\u003cp\u003eD.Yu.I., M.Di M., D.C., K.C., G.D. and G.S. contributed to the design of the work, data analysis and interpretation; S.M. and J.G. contributed to the data acquisition and labeling; D.Yu.I. and Z.C. contributed to the creation of the new software used in the work; D.Yu.I. and S.M. contributed to drafting the first version of the manuscript; all authors revised the final manuscript.\u003c/p\u003e\n\u003cp\u003eConsent for publication of video\u003c/p\u003e\n\u003cp\u003eThe caregiver of the participant whose video was used in the Supplementary Materials, as well as blurred in the Figures, provided consent to use the materials in publication. All other data in the paper are anonymized. \u003cstrong\u003eInformed consent was obtained from the subjects and/or their legal guardian(s) for publication of identifying information/images in an online open-access publication.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eCompeting interests\u003c/p\u003e\n\u003cp\u003eDr. Dawson is on the Scientific Advisory Boards of Akili, Inc., Zynerba Pharmaceutical, Inc., Nonverbal Learning Disability Project, and Tris Pharma, Inc., is a consultant to Apple, Inc., Gerson Lehrman Group, and Guidepoint Global, LLC, received speaker fees from WebMD and book royalties from Guilford Press, Oxford University Press, and Springer Nature Press. Dr. Dawson has stock interests in Neuvana, Inc. Dr. Dawson has four patents (three issued, one pending): 16678789, 1514139, 63354492, and 10912801B2. Dr. Dawson has developed technology, data, and/or products that have been licensed to Apple, Inc. and Cryocell, Inc. and Dawson and Duke University have benefited financially. \u0026nbsp;Dr. Sapiro is affiliated with Apple, Inc. Dr. Carpenter has had funding by the National Institutes of Health (NIH), the Department of Defense, and the Brain and Behavior Foundation. Dr. Carpenter is a standing member on the Programmatic Panel for the Department of Defense Congressionally Directed Medical Research Programs (CDMRP) Autism Research Program and has served as an ad hoc reviewer on NIH review panels; she has received reimbursement for her time on these panels. The remaining authors declare no competing interests. \u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eDeBoer, T., Scott, L., \u0026amp; Nelson, C. Methods for acquiring and analyzing infant event-related potentials in \u003cem\u003eInfant EEG and Event-Related Potentials\u003c/em\u003e, 5\u0026ndash;38 (Psychology Press, 2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThierry, G. The use of event-related potentials in the study of early cognitive development. Infant and Child Development 14(1), 85\u0026ndash;94 (2005). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1002/icd.353\u003c/span\u003e\u003cspan address=\"10.1002/icd.353\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIsaev, D. Y. et al.. Relative average look duration and its association with neurophysiological activity in young children with autism spectrum disorder. Scientific Reports 10(1), (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-020-57902-1\u003c/span\u003e\u003cspan address=\"10.1038/s41598-020-57902-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWebb, S. J. et al.. Guidelines and best practices for electrophysiological data collection, analysis and reporting in autism. Journal of Autism and Developmental Disorders 45(2), 425\u0026ndash;443 (2015). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10803-013-1916-6\u003c/span\u003e\u003cspan address=\"10.1007/s10803-013-1916-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStets, M., Stahl, D., \u0026amp; Reid, V. M. A meta-analysis investigating factors underlying attrition rates in infant ERP studies. Dev. Neuropsychol. 37(3), 226\u0026ndash;252 (2012). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/87565641.2012.654867\u003c/span\u003e\u003cspan address=\"10.1080/87565641.2012.654867\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBell, M. A., \u0026amp; Cuevas, K. Using EEG to study cognitive development: issues and practices. J. Cogn. Dev. 13(3), 281\u0026ndash;294 (2012). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/15248372.2012.691143\u003c/span\u003e\u003cspan address=\"10.1080/15248372.2012.691143\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEllis, A. E., \u0026amp; Nelson, C. A. Category prototypicality judgments in adults and children: behavioral and electrophysiological correlates. Developmental Neuropsychology 15(2), 193\u0026ndash;211 (1999). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/87565649909540745\u003c/span\u003e\u003cspan address=\"10.1080/87565649909540745\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTodd, R. M., Lewis, M. D., Meusel, L. A., \u0026amp; Zelazo, P. D. The time course of social-emotional processing in early childhood: ERP responses to facial affect and personal familiarity in a Go-Nogo task. Neuropsychologia 46(2), 595\u0026ndash;613 (2008). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.neuropsychologia.2007.10.011\u003c/span\u003e\u003cspan address=\"10.1016/j.neuropsychologia.2007.10.011\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMurias, M. et al.. Validation of eye-tracking measures of social attention as a potential biomarker for autism clinical trials. Autism Research 11(1), 166\u0026ndash;174 (2018). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/aur.1894\u003c/span\u003e\u003cspan address=\"10.1002/aur.1894\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDawson, G. et al.. Early behavioral intervention is associated with normalized brain activity in young children with autism. Journal of the American Academy of Child and Adolescent Psychiatry 51(11), 1150\u0026ndash;1159 (2012). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jaac.2012.08.018\u003c/span\u003e\u003cspan address=\"10.1016/j.jaac.2012.08.018\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOrekhova, E. V., Stroganova, T. A., Posikera, I. N., \u0026amp; Elam, M. EEG theta rhythm in infants and preschool children. Clinical Neurophysiology 117(5), 1047\u0026ndash;1062 (2006). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.clinph.2005.12.027\u003c/span\u003e\u003cspan address=\"10.1016/j.clinph.2005.12.027\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMurias, M. et al.. Electrophysiological biomarkers predict clinical improvement in an open-label trial assessing efficacy of autologous umbilical cord blood for treatment of autism. Stem Cells Translational Medicine, 783\u0026ndash;791 (2018). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/sctm.18-0090\u003c/span\u003e\u003cspan address=\"10.1002/sctm.18-0090\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKaiser, A. et al.. EEG data quality: determinants and impact in a multicenter study of children, adolescents, and adults with attention-deficit/hyperactivity disorder (ADHD). Brain Sci. 11(2), (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/brainsci11020214\u003c/span\u003e\u003cspan address=\"10.3390/brainsci11020214\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWebb, S. J. et al.. Biomarker acquisition and quality control for multi-site studies: the autism biomarkers consortium for clinical trials [methods]. Frontiers in Integrative Neuroscience 13, (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fnint.2019.00071\u003c/span\u003e\u003cspan address=\"10.3389/fnint.2019.00071\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eElsabbagh, M. et al.. Disengagement of visual attention in infancy is associated with emerging autism in toddlerhood. Biological Psychiatry 74(3), 189\u0026ndash;194 (2013). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.biopsych.2012.11.030\u003c/span\u003e\u003cspan address=\"10.1016/j.biopsych.2012.11.030\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKeehn, B., M\u0026uuml;ller, R. A., \u0026amp; Townsend, J. Atypical attentional networks and the emergence of autism. Neuroscience and Biobehavioral Reviews 37(2), 164\u0026ndash;183 (2013). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.neubiorev.2012.11.014\u003c/span\u003e\u003cspan address=\"10.1016/j.neubiorev.2012.11.014\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcPartland, J. C., Webb, S. J., Keehn, B., \u0026amp; Dawson, G. Patterns of visual attention to faces and objects in autism spectrum disorder. Journal of Autism and Developmental Disorders 41(2), 148\u0026ndash;157 (2011). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10803-010-1033-8\u003c/span\u003e\u003cspan address=\"10.1007/s10803-010-1033-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWerner, E., Dawson, G., Osterling, J., \u0026amp; Dinno, N. Recognition of autism spectrum disorder before one year of age. Journal of Autism and Developmental Disorders 30(2), 157\u0026ndash;162 (2000).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOrekhova, E. V. et al.. EEG hyper-connectivity in high-risk infants is associated with later autism. Journal of Neurodevelopmental Disorders 6(1), 1\u0026ndash;11 (2014). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/1866-1955-6-40\u003c/span\u003e\u003cspan address=\"10.1186/1866-1955-6-40\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStroganova, T. A., V. Orekhova, E., \u0026amp; Posikera, I. N. Externally and internally controlled attention in infants: An EEG study. International Journal of Psychophysiology 30(3), 339\u0026ndash;351 (1998). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/S0167-8760(98)00026-9\u003c/span\u003e\u003cspan address=\"10.1016/S0167-8760(98)00026-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhtola, E., Stjerna, S., Stevenson, N., \u0026amp; Vanhatalo, S. Use of eye tracking improves the detection of evoked responses to complex visual stimuli during EEG in infants. Clin. Neurophysiol. Pract. 2, 81\u0026ndash;90 (2017). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.cnp.2017.03.002\u003c/span\u003e\u003cspan address=\"10.1016/j.cnp.2017.03.002\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaguire, M. J., Magnon, G., \u0026amp; Fitzhugh, A. E. Improving data retention in EEG research with children using child-centered eye tracking. J. Neurosci. Methods 238, 78\u0026ndash;81 (2014). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jneumeth.2014.09.014\u003c/span\u003e\u003cspan address=\"10.1016/j.jneumeth.2014.09.014\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBaltrusaitis, T., Zadeh, A., Lim, Y. C., \u0026amp; Morency, L. P. Openface 2.0: facial behavior analysis toolkit. 2018 \u003cem\u003e13th IEEE International Conference on Automatic Face \u0026amp; Gesture Recognition (FG 2018)\u003c/em\u003e, Xi'an, China, 2018, pp. 59\u0026ndash;66, doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/FG.2018.00019\u003c/span\u003e\u003cspan address=\"10.1109/FG.2018.00019\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKrafka, K. et al.. Eye tracking for everyone. 2016 \u003cem\u003eIEEE Conference on Computer Vision and Pattern\u003c/em\u003e Recognition \u003cem\u003e(CVPR)\u003c/em\u003e, Las Vegas, NV, USA, 2016, pp. 2176\u0026ndash;2184, doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/CVPR.2016.239\u003c/span\u003e\u003cspan address=\"10.1109/CVPR.2016.239\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLugaresi, C. et al.. Mediapipe: A framework for building perception pipelines. Third Workshop on Computer Vision for AR/VR at IEEE Computer Vision and Pattern Recognition (CVPR), (2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.1906.08172\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.1906.08172\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTorre, F. D. et al.. IntraFace. 2015 \u003cem\u003e11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)\u003c/em\u003e, Ljubljana, Slovenia, 2015, pp. 1\u0026ndash;8, doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/FG.2015.7163082\u003c/span\u003e\u003cspan address=\"10.1109/FG.2015.7163082\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePerochon, S. et al.. A scalable computational approach to assessing response to name in toddlers with autism. Journal of Child Psychology and Psychiatry 62(9), 1120\u0026ndash;1131 (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1111/jcpp.13381\u003c/span\u003e\u003cspan address=\"10.1111/jcpp.13381\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChang, Z. et al.. Computational Methods to Measure Patterns of Gaze in Toddlers With Autism Spectrum Disorder. JAMA Pediatrics 175(8), 827\u0026ndash;836 (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1001/jamapediatrics.2021.0530\u003c/span\u003e\u003cspan address=\"10.1001/jamapediatrics.2021.0530\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eErel, Y., Potter, C. E., Jaffe-Dax, S., Lew-Williams, C., \u0026amp; Bermano, A. H. iCatcher: a neural network approach for automated coding of young children's eye movements. Infancy 27(4), 765\u0026ndash;779 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/https://doi.org/10.1111/infa.12468\u003c/span\u003e\u003cspan address=\"10.1111/infa.12468\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQian, X., Wang, M., Wang, X., Wang, Y., \u0026amp; Dai, W. Intelligent method for real-time portable EEG artifact annotation in semiconstrained environment based on computer vision. Comput Intell. Neurosci. 9590411, (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1155/2022/9590411\u003c/span\u003e\u003cspan address=\"10.1155/2022/9590411\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGotham, K. et al.. A replication of the Autism Diagnostic Observation Schedule (ADOS) revised algorithms. J. Am. Acad. Child Adolesc. Psychiatry 47(6), 642\u0026ndash;651 (2008). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1097/CHI.0b013e31816bffb7\u003c/span\u003e\u003cspan address=\"10.1097/CHI.0b013e31816bffb7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eElliott, C. D. \u003cem\u003eDifferential Ability Scales, 2nd Edition\u003c/em\u003e (Harcourt Assessment, 2007).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKing, D. E. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 10, 1755\u0026ndash;1758 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCohen, J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37\u0026ndash;46 (1960). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1177/001316446002000104\u003c/span\u003e\u003cspan address=\"10.1177/001316446002000104\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuo, C., Pleiss, G., Sun, Y., \u0026amp; Weinberger, K. Q. On calibration of modern neural networks. \u003cem\u003e34th International Conference on Machine Learning, ICML 2017\u003c/em\u003e, \u003cem\u003e3\u003c/em\u003e, 2130\u0026ndash;2143, (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHastie, T., Tibshirani, R., \u0026amp; Friedman, J. \u003cem\u003eThe Elements of Statistical Learning\u003c/em\u003e (Springer New York Inc., 2001).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKingma, D. P., \u0026amp; Ba, J. \u003cem\u003eAdam: A Method for Stochastic Optimization\u003c/em\u003e (2015) \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/1412.6980\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/1412.6980\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaszke, A. et al.. PyTorch: an imperative style, high-performance deep learning library. \u003cem\u003eProceedings of the 33rd International Conference on Neural Information Processing Systems\u003c/em\u003e (pp. Article 721). Curran Associates Inc. (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcHugh, M. L. Interrater reliability: the kappa statistic. Biochem. Med. (Zagreb) 22(3), 276\u0026ndash;282 (2012).\u003c/span\u003e\u003c/li\u003e \u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"EEG, visual attention, computer vision, machine learning, data processing automation","lastPublishedDoi":"10.21203/rs.3.rs-4637470/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4637470/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eElectroencephalography (EEG) recordings with visual stimuli require detailed coding to determine the periods of participant\u0026rsquo;s attention. Here we propose to use a supervised machine learning model and off-the-shelf video cameras only. We extract computer vision-based features such as head pose, gaze, and face landmarks from the video of the participant, and train the machine learning model (multi-layer perceptron) on an initial dataset, then adapt it with a small subset of data from a new participant. Using a sample size of 23 autistic children, and training on additional 2560 labeled frames (equivalent to 85.3 seconds of the video) of a new participant, the median area under the receiver operating characteristic curve for inattention detection was 0.989 (IQR 0.984\u0026ndash;0.993) and the median inter-rater reliability (Cohen\u0026rsquo;s kappa) with a trained human annotator was 0.888. Agreement with consensus annotation on four participants labeled independently by two human annotators was in the 0.827\u0026ndash;0.960 range. Our results demonstrate the feasibility of automatic tools to detect inattention during EEG recordings, and its potential to reduce the subjectivity and time burden of human attention coding. The tool for model adaptation and visualization of the computer vision features is made publicly available to the research community.\u003c/p\u003e","manuscriptTitle":"Use of Computer Vision Analysis for Labeling Inattention Periods in Eeg Recordings With Visual Stimuli","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-10 12:06:16","doi":"10.21203/rs.3.rs-4637470/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-12-18T13:23:07+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-11-22T10:14:11+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"52220896587408716161295775025717051195","date":"2024-10-31T16:52:11+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"147108530178811888751250722970891662425","date":"2024-09-05T10:55:43+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-07-29T16:22:22+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"137343883802647982204247790744380277404","date":"2024-07-19T16:14:20+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"67406047083653296499881003185073113075","date":"2024-07-19T05:49:22+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-07-16T15:51:48+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-07-16T15:48:55+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2024-07-12T11:19:06+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-07-11T03:48:58+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2024-06-25T15:16:48+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"7d3b1c43-3849-4913-ae49-155d974df4ac","owner":[],"postedDate":"August 10th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":35574529,"name":"Biological sciences/Neuroscience/Cognitive neuroscience"},{"id":35574530,"name":"Biological sciences/Neuroscience/Cognitive neuroscience/Attention"}],"tags":[],"updatedAt":"2025-08-25T16:34:33+00:00","versionOfRecord":{"articleIdentity":"rs-4637470","link":"https://doi.org/10.1038/s41598-025-10511-2","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2025-08-22 16:29:19","publishedOnDateReadable":"August 22nd, 2025"},"versionCreatedAt":"2024-08-10 12:06:16","video":"","vorDoi":"10.1038/s41598-025-10511-2","vorDoiUrl":"https://doi.org/10.1038/s41598-025-10511-2","workflowStages":[]},"version":"v1","identity":"rs-4637470","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4637470","identity":"rs-4637470","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00