Skip to the content.

Stereo processing in the human brain in the light of the superposition theory

Authors: Bayu Gautama Wundari, Ichiro Fujita, Hiroshi Ban

Presentation type: Poster at SNUFA 2025 online workshop (5-6 Nov 2025)

Abstract

Animals with front-facing eyes benefit from stereopsis to know what thing is in depth by analyzing the slight differences in 2D retinal images (binocular disparity). Correlation-based frameworks have been the gold standard for explaining neural responses to disparity in primary visual cortex (V1). These models successfully predict the phase changes of neural responses to binocularly anticorrelated images (the left and right eyes have opposite contrast), suggesting V1 should exhibit robust inverted representation of anticorrelated stimuli. Using fMRI, however, we found that mid-dorsal area V3A, not V1, shows a reliable inverted depth representation for anticorrelated images. This discrepancy suggests that correlation-based models cannot fully explain the group-level cortical activity. We hypothesize that V1 encodes disparity in a superposition, where a network model wants to represent all features when its numbers exceed the number of available neurons at the cost of interfering with weaker anticorrelated representation. To test this, we trained shallow neural networks on natural stereo images with four types of binocular interactions: Concat (concatenating the left (L) and right (R) features, [L, R]), BEM (binocular energy model, concatenating the monocular energy and the interocular terms, [L2+R2, LR]), CMM (cross-correlation and cross-matching model, concatenating the interocular terms and their rectified counterparts, [LR, ReLU(LR)]), and Sum-diff (summing-differencing channels, concatenating the summing and differencing terms, [L+R, L-R]). Correlation-based architectures (BEM and CMM) exhibited lower feature dimensionality, suggesting stronger superposition. All shallow models failed to match human depth judgments. We next examined deep neural networks, demonstrating a closer alignment with human depth performance, though differences remained. Monosemanticity analysis revealed that these behavioral differences reflect the difference in the distribution of neurons selective for correlated and anticorrelated stimuli. Our findings suggest that stereo processing in the human brain requires hierarchical processing with additional learning objectives beyond performance optimization.