Exploiting Sparsity for Accelerated SNN Training on Graphcore IPUs
Authors: Jan Finkbeiner, Emre Neftci
Presentation type: Poster
With recent algorithmic advances in the training of spiking neural networks (SNNs), SNNs have become a promising alternative to common machine learning approaches for event-based time series data.
Although GPUs are not ideal for the distributed local computation and the sparse communication required by SNNs, they currently are the most commonly used hardware for SNN training. While neuromorphic chips that utilize the dynamical sparsity of SNNs for inference do exist, they mostly lack the necessary algorithmic flexibility to train large networks.
In this work we demonstrate that Graphcore’s IPU is a potentially promising candidate for fast and efficient large scale SNN training, as it combines both the algorithmic flexibility of GPUs with the capability to take advantage of sparse and local computations. The IPU is designed as a Multiple Instruction Multiple Data (MIMD) architecture with near-memory computing. This enables the brain-inspired principle of local computation with drastically reduced global data transfer as well as the irregular memory access patterns necessary to efficiently exploit dynamical activation sparsity and sparse connectivity.
For SNN training we use a deep learning inspired workflow with weight updates being calculated based on gradient descent with surrogate gradients. In order to enable the integration of dynamically sparse operations into the SNN training workflow on the IPU, we implement custom operations based on integer valued sparse activation tensors that interface with Tensorflow’s python API.
Our current results show a clear advantage in using sparse activations on the IPU by achieving at least 5-10 times higher throughput compared to a NVIDA GeForce RTX 3090 GPU, depending activation sparsity, network size and other hyperparamter choices. Our distributed SNN implementations on multiple IPUs show promising scaling behavior and potentially even greater gains due to the reduced communication requirement with sparse spiking tensors and the fast interconnect that comes with IPUs.