Full text
2,067 characters
· extracted from
oa-html
· click to expand
Abstract
Stimulus-computable models have transformed our understanding of ventral visual processing, yet comparable progress in modeling the dorsal visual stream have lagged behind. Classical motion-energy models capture only local signals and fall short of representing coherent structure from motion, while image-trained neural networks discard the temporal structure essential to motion-based computations. This leaves the dorsal pathway without a computational account linking dynamic visual inputs to the neural activity underlying shape processing. We address this gap by combining human psychophysics, chronic neural recordings from macaque dorsal and ventral cortices, and systematic evaluation of a large-scale model zoo. Using texture-masked rotating objects that isolate motion-defined surface geometry from static cues, we found that both visual path-ways carry decodable representations of object surfaces, with dorsal regions more closely tracking human behavioral judgements. Encoding analyses reveal that predictive coding video models–trained to predict spatiotemporal features in natural videos–best predict neural responses in the inferior parietal lobule (IPL), a downstream region of the dorsal visual pathway. These models outperform alternative models, including both classical motion filters and multimodal foundation models, suggesting that temporal prediction objectives may be critical for capturing how cortex represents surface geometry from dynamic inputs. Our results establish predictive coding video models as a stimulus-computable baseline of the dorsal visual pathway and provide a framework for extending model-based neural system identification from static images to dynamic, naturalistic vision.
Competing Interest Statement
James DiCarlo serves on the Yale University's Wu Tsai Institute Advisory Board, the External Advisory Board of the AI Institute for Artificial and Natural Intelligence (ARNI), and the Advisory Committee of the Lefler Center at Harvard Medical School. The remaining authors declare no competing interests.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.