Inferring response times of perceptual decisions with Poisson variational autoencoders

Authors: Hayden R. Johnson, Anastasia N. Krouglova, Hadi Vafaii, Jacob L. Yates, Pedro J. Goncalves

Presentation type: Poster at SNUFA 2025 online workshop (5-6 Nov 2025)

Abstract

Many properties of perceptual decision making are well-modeled by deep neural networks. However, such architectures typically treat decisions as instantaneous readouts, overlooking the temporal dynamics of the decision process. Classic evidence accumulation models, such as the drift diffusion model, account for these temporal effects by positing the existence of abstract decision variables, raising the question of how these variables emerge from sensory input. We present an image-computable model of perceptual decision making in which choices and response times arise from efficient sensory encoding and Bayesian decoding of neural spiking activity. We use a Poisson variational autoencoder to learn unsupervised representations of visual stimuli in a population of rate-coded neurons, modeled as independent homogeneous Poisson processes. A task-optimized decoder then continually infers an approximate posterior over actions conditioned on incoming spiking activity. Combining these components yields a principled and image-computable model of perceptual decisions capable of producing trial-by-trial patterns of choices and response times. Applied to MNIST digit classification, the model reproduces hallmark psychophysical regularities, including stochastic variability across repeated trials, right-skewed response time distributions, logarithmic scaling of response times with the number of alternatives (Hick’s law), and adaptive speed–accuracy trade-offs. Conceptually, the architecture makes explicit how constraints on sensory coding and variable spiking activity can give rise to systematic patterns of human behavior, providing a functional link between efficient coding theory and decision dynamics. By bridging unsupervised sensory encoding with normative evidence accumulation, our framework demonstrates how temporal properties of decisions can be derived directly from image-level inputs. This suggests a general approach for constructing resource-rational models of perceptual decision making and highlights response times as a valuable axis for aligning artificial and biological neural systems.