Predictive coding video models capture dorsal parietal representations and human judgments for surfaces defined by motion

doi:10.64898/2026.05.13.724755

Predictive coding video models capture dorsal parietal representations and human judgments for surfaces defined by motion

2026 · doi:10.64898/2026.05.13.724755

preprint OA: closed CC-BY-NC-ND-4.0

🔓 Open OA copy Full text JSON View at publisher

Full text 2,067 characters · extracted from oa-html · click to expand

Abstract Stimulus-computable models have transformed our understanding of ventral visual processing, yet comparable progress in modeling the dorsal visual stream have lagged behind. Classical motion-energy models capture only local signals and fall short of representing coherent structure from motion, while image-trained neural networks discard the temporal structure essential to motion-based computations. This leaves the dorsal pathway without a computational account linking dynamic visual inputs to the neural activity underlying shape processing. We address this gap by combining human psychophysics, chronic neural recordings from macaque dorsal and ventral cortices, and systematic evaluation of a large-scale model zoo. Using texture-masked rotating objects that isolate motion-defined surface geometry from static cues, we found that both visual path-ways carry decodable representations of object surfaces, with dorsal regions more closely tracking human behavioral judgements. Encoding analyses reveal that predictive coding video models–trained to predict spatiotemporal features in natural videos–best predict neural responses in the inferior parietal lobule (IPL), a downstream region of the dorsal visual pathway. These models outperform alternative models, including both classical motion filters and multimodal foundation models, suggesting that temporal prediction objectives may be critical for capturing how cortex represents surface geometry from dynamic inputs. Our results establish predictive coding video models as a stimulus-computable baseline of the dorsal visual pathway and provide a framework for extending model-based neural system identification from static images to dynamic, naturalistic vision. Competing Interest Statement James DiCarlo serves on the Yale University's Wu Tsai Institute Advisory Board, the External Advisory Board of the AI Institute for Artificial and Natural Intelligence (ARNI), and the Advisory Committee of the Lefler Center at Harvard Medical School. The remaining authors declare no competing interests.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-NC-ND-4.0