Scene segmentation processes drive EEG-DCNN alignment

preprint OA: closed
📄 Open PDF Full text JSON View at publisher
Full text 2,948 characters · extracted from oa-doi-fallback · click to expand
Abstract Visual processing in biological and artificial neural networks has been extensively studied through the lens of object recognition. While deep convolutional neural networks (DCNNs) have demonstrated hierarchical feature extraction similar to biological systems (DiCarlo and Cox, 2007; Yamins and DiCarlo, 2016), recent findings reveal a growing discrepancy: DCNNs with higher object categorization accuracy paradoxically show worse performance at predicting neural responses (Xu and Vaziri-Pashkam, 2021; Linsley et al., 2023). Using a large-scale human electroencephalography (EEG) dataset (n=10, 82,160 trials), we investigate whether this discrepancy arises because human neural EEG signals predominantly reflect scene segmentation processes rather than high-level, category-specific object representations. We trained DCNNs to perform object recognition using visual diets (∼1 million training images across 292 object categories) with systematically varying scene segmentation demands: objects-only (pre-segmented), background-silhouette (explicit boundaries), original/background-only images (requiring full segmentation). Despite substantial differences in categorization accuracy (27-53%), all trained models showed remarkably uniform encoding performance, with peak correlations with neural data at ∼0.1s post-stimulus. Layer-wise analysis revealed a significant negative correlation between categorization accuracy and encoding performance, with earlier network layers better predicting EEG responses than deeper layers specialized for object categorization. This dissociation suggests that EEG signals primarily reflect fundamental scene parsing mechanisms rather than object-specific representations, explaining the growing discrepancy between DCNN’s increasing categorization performance but deteriorating neural prediction performance. Significance Statement This research provides a novel perspective on human electroencephalography (EEG) signals during visual processing through systematic manipulation of scene segmentation demands in deep neural networks. Using a large-scale dataset of 82,160 EEG trials and 20 trained DCNNs, we demonstrate that EEG responses primarily reflect early visual processing involved in breaking down and organizing visual scenes (scene segmentation/parsing) rather than high-level object recognition. This finding helps explain previously observed discrepancies between DCNNs’ categorization performance and neural prediction accuracy, suggesting that improving models’ ability to segment scenes, rather than simply recognizing isolated objects, may better align artificial and biological visual processing. Competing Interest Statement The authors have declared no competing interest. Footnotes Conflict of interest: The authors declare no competing financial interests. Data and code availability: Data and code to reproduce the analyses in this article will be made available upon request.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00