Quantized Rewiring: Hardware-aware training of sparse deep neural networks
Authors: Horst Petschenig and Robert Legenstein
Presentation type: Poster
Abstract
Mixed-signal and fully-digital neuromorphic systems have been of significant interest for deploying spiking neural networks in an energy-efficient manner. However, many of these systems impose constraints in terms of fan-in, memory, or synaptic weight precision that have to be considered during network design and training. We present Quantized Rewiring, an algorithm that can train both spiking and non-spiking neural networks while meeting hardware constraints during the entire training process. To demonstrate our approach, we train both feed-forward and recurrent neural networks with a combined fan-in/weight precision limit, a constraint that is for example present in the Dynap-SE mixed-signal analog-digital neuromorphic processor. The Dynap-SE neuromorphic processor features a limited fan-in of 64 connections per neuron. Each connection between two neurons can be one of four possible types where all synapses of the same type have the same weight. Different synaptic weights can be achieved by having multiple connections between the same two neurons. Quantized Rewiring simultaneously performs quantization and rewiring of synapses and synaptic weights through gradient descent updates and projecting the trainable parameters to a constraint-compliant region. Due to the generality of our approach it is possible to model a large number of different – and possibly opposing – device constraints during training. The design of this approach is amenable to implementation on novel neuromorphic designs that support full or partial on-chip learning capabilities via approximations to backpropagation. Using our algorithm, we find trade-offs between the number of incoming connections to neurons and network performance for a number of common benchmark datasets: For example, we show that quantized and rewired networks can solve the CIFAR10 image classification task with a ResNet architecture and the sequential MNIST memory task with recurrent spiking neural networks subject to fan-in limits of 8 and 64 connections, respectively.