SPIKING NEURAL NETWORKS & THEIR FAULT TOLERANCE: A LITERATURE REVIEW
Despite all the impressive advances achieved by conventional ANNs, the human brain still excels in the most basic cognitive tasks. Originally a topic of interest in theoretical neurobiology and biophysics research, spiking neural networks were designed as an attempt to mimic the bio-logical neural networks to enable the analysis of elementary processes of the brain. With the technological leap in AI and deep learning, neuromor-phics and SNNs have come to the spotlight as a promising new paradigm that can possibly take deep learning to places it could not reach before.
In this chapter, we offer a basic literature review of the two pillar topics of this thesis: SNNs in hardware and previous efforts to make them fault tolerant.
spiking neural networks
A biological or an artificial neural network is made up of neurons con-nected in a sophisticated complex pattern through synapses. The neuron is the basic processing unit in a neural network, and according to the computational paradigm of their processing units, ANNs can be clas-sified into three generations . The first generation is based on the McCulloch-Pitts neurons, or the perceptrons. Built exclusively to give digital outputs, a multilayer perceptron is able to compute every boolean function with only one hidden layer. The second generation is based on neurons implementing an « activation function », such as the sigmoid function. Neural networks from the second generation can compute func-tions with analog inputs and outputs, which give them a more realistic essence than the first-generation networks if their output is thought of as representing the average firing frequency of a biological neuron. Nonethe-less, functionality-wise, both generation models are very different from biological neural networks.
Neurons in the brain communicate through discrete short electrical pulses called spikes. A typical neuron fires spikes at a frequency that is less than 100MHz, which means that a window of 20 – 30ms is needed only to compute the current firing rate . However, experimental results have shown that a visual processing task can be completed in just 20 – 30ms, thus it was doubtful that biological neurons actually use the firing rate as a main coding scheme , . Instead, it seemed that the timing of spikes plays an important role in neural coding .
These findings made way for a third generation of neural networks that uses spiking neurons as the basic element. Spiking neurons commu-nicate -much like their biological counterparts- through spikes and code information in a spatio-temporal manner, making them more biologically realistic compared to the previous two models. In this section, we review the basic concepts of SNNs.
The Spiking Neuron
The spiking neuron is the basic building block of neuromorphic systems and where most of the processing takes place. A spiking neuron is often described as an integrator with a threshold . Instead of constantly firing, as in the case of first- and second-generation neurons, a spiking neuron accumulates inputs from preceding neurons, and if a certain number of spikes occur within a specific time frame, it generates a spike of its own.
The Spike, also known as the action potential, is the primary means of communication in between spiking neurons. It is a short electrical pulse of around 100mV amplitude and a 1 – 2ms duration, that represents an abrupt momentary change in the state of the neuron. Spikes are stereotypical events, shown in Fig. 2.1, that do not change form as they propagate from pre-synaptic to post-synaptic neurons. Consequently, the shape of the spike carries no information. Instead, the timing of each spike and the length of the inter-spike intervals are the key aspects of information coding in SNNs.
The Membrane Potential
The membrane potential is the voltage that describes the state of the spiking neuron at any given time. In the quiescent state with no input spikes, the membrane potential of a typical biological neuron is strongly polarized at a resting voltage of about -65mV. Incoming spikes evoke a positive or a negative change in the membrane potential, referred to as the post-synaptic potential. An excitatory input will reduce the negative polarization of the membrane potential, i.e., depolarize it, while an inhibitory input will hyperpolarize the membrane voltage even further.
The mechanism of spike generation is conceptually illustrated in Fig. 2.2. When an input spike arrives, the membrane potential increases for a moment before it starts decaying again. This decay of the membrane voltage between incoming spikes is known as the leakage, which plays a significant role in the temporal correlation between spikes. Temporal cor-relation is a crucial concept which facilitates the possibility of exploiting the temporal information contained in real-world sensory data .
After the neuron spikes, the membrane potential returns to the resting value and stays there until the next input spike comes along. The neuron cannot fire again until a certain period has passed, which is known as the absolute refractory period. In response to an input stimulus, spiking neurons can emit a chain of action potentials, called a spike train, that can have regular or irregular intervals.
Neural Coding Schemes
The concept of mean firing rate dates back to the 1920s , where Adrian et al. proved that the firing rate of certain neuron types in the muscles is related to the strength of the applied stimulus. For so long after that, the firing rate was thought to be the principle neural coding scheme. However, neuroscience experiments have demonstrated that for high-speed neural processing, the timing characteristics of individual spikes and spike trains cannot be ignored .
In SNNs, Coding schemes can be classified into :
1. Rate Coding Schemes: converting the activity level of neurons into a firing rate, which does not benefit from sparse spikes inherent to biological systems . The neuron firing rate can be calculated according to:
v The spike count: averaging over time.
v The spike density: averaging over several runs.
v The population activity: averaging over the activity of several neurons that act together.
2. Spike Coding Schemes: Coding strategies that are based on the timing of individual spikes. There are many timing aspects through which neurons in the brain communicate information, including:
v Time to first spike: in systems that require ultra-fast information processing, such as tactile and olfactory systems, the delay be-tween the beginning of the input stimulus and the first output spike was shown to carry enough information , .
v Phase coding: where neurons can encode information in the phase of a spike with respect to some background oscillations.
v Rank order coding: information is coded in the order of output spikes in populations of neurons .
v Correlation and synchrony: a coding scheme based on the obser-vation that neurons that encode pieces of the same information object fire synchronously.
Nonetheless, the distinction between rate coding and spike coding scheme is not very sharp, i.e., some of the codes that are viewed as timing codes can also be interpreted as variations of a rate code. For example, the time-to-first-spike coding scheme is related to the firing rate scheme since a neuron that spikes early would be expected to have a high firing rate, and vice versa.
spiking neural networks in hardware
There are numerous models of spiking neurons used in neuromorphic systems , most of which perform the accumulation and firing function, albeit with different mechanisms. In the simplest deterministic model, the neuron is assumed to fire whenever its membrane potential reaches a threshold (#). The fired spike travels along the synapse that has an efficacy, or weight (!).
Spiking neuron models can be grouped into broad categories based on their degree of complexity and their resemblance to biology in structure or function . The question of which neuron model to use usually depends on the type of problem at hand . These categories are:
v Biologically-Plausible Models:
These models are built to explicitly model the behaviors observed in biological neurons and are mostly used for accurate simulations of biological neural systems. One of the most famous models of this category is the classical Hodgkin-Huxley model , which uses four-dimensional nonlinear differential equations to describe neural be-havior. Other models like the Morris-Lecar model  try to simplify things by reducing the dimensions of the nonlinear equations. In any case, these models are often used to simulate a small number of neu-rons since they are usually very expensive to implement.
v Biologically-Inspired Models:
Biologically inspired models, on the other hand, try to replicate the neural behaviors without the obligation of emulating the physical activity of biological systems. Consequently, they are much simpler than the biologically plausible models in terms of computations and implementations and can therefore be more efficient in modeling large-scale systems. There are many models in this category that are very common in neuromorphic literature, including the FitzHugh–Nagumo model ,  and the popular Izhikevich model , which can reproduce biologically accurate behavior with a set of two-dimensional ordinary differential equations.
v Integrate-and-Fire Models:
Another category of spiking neuron models especially suited for mini-mizing computational complexity is the Integrate-and-Fire (I&F) mod-els. These models can vary in complexity from the basic integration and firing function to something approaching the complexity level of the Izhikevich model. While not biologically realistic, I&F models are still complex enough to incorporate spiking dynamics. Beside the basic I&F model, there are also the leaky I&F model, the resonate-and-fire model , and the quadratic I&F model . More complex imple-mentations also exist, such the Adaptive Exponential Integrate and Fire model .
Another model that can technically fall under this category, although it performs the neural behavior in the form of response kernels rather than differential equations, is the Spike-Response Model (SRM) . The SRM is a generalized form of the I&F model and will be discussed in more details in Chapter 5.
Address Event Representation
In the attempt to build neural networks that resemble the human brain, the communication problem becomes evident. Compared to digital sys-tems where each gate is connected to a relatively small number of outputs, a typical neuron can carry outputs to thousands of targets. This massive interconnection is easily built in biology using 3-D structures. However, it becomes a challenge when brought to 2-D silicon substrates . On the other hand, a typical neuron fires at a rate ranging from 1 – 10 kHz, which means that a cluster of neurons can have a collective firing rate in the kHz to low MHz range; a rate that modern digital systems can effortlessly hold. Therefore, in neuromorphic systems, communication is usually carried out between clusters of neurons by time-multiplexing spikes on a single channel. In neuromorphic systems, this is done using the Address Event Representation (AER) protocol .
AER is a communication protocol that makes use of the fact that spikes carry no information other than the fact that neuron i fired at time tf, and sends out only this information. Each neuron in a cluster is assigned a unique address and every generated spike is encoded into a sequence that indicates the identity (or address) of the neuron that fired that spike and the time at which it fired. This way, huge synaptic connections can be avoided, and the hardware resources can be efficiently allocated where they are needed.
There are multiple approaches to the implementation of AER event-based systems that exploit the features of a hardware neural network . The simplest form is a basic AER scheme where the whole network can share a single communication channel, and all events are processed through an external block that controls their traffic. Other schemes dis-tribute this multiplexing among several points in the network. A more sophisticated scheme puts a router block within each cluster module that is responsible for directing traffic between modules. Using this scheme can be helpful in large-scale systems because it allows a 2-D mesh archi-tecture where each module is only connected to its immediate neighbors. The router-based scheme will be further explained in the case study presented in Chapter 7.
neuromorphic technology prospects and challenges
Judging by the rise in research activity over the past decade , neuro-morphic computing is becoming mainstream with every passing day. This is mainly because of the many advantages that this technology promises, which could potentially take neuromorphic technology far beyond what has been accomplished by conventional ANNs. These advantages can be summed up into:
1. Power efficiency.
Much like biological neural networks, SNNs are built for event-driven information processing. Information from the outside world is usually sparse, which means it can be processed in a way much more efficient than the frame-based approach of conventional ANNs. ANNs perform data sampling at predefined time steps that ignores the speed of incoming information. SNNs on the other hand process spikes as they come, meaning that the network is only working when there is something to be processed, otherwise, no computa-tions are taking place. This event-driven operation is the basis of the huge power savings promised by neuromorphic computing and could make neuromorphic systems the go-to solution for applica-tions such as IoT at the edge and autonomous vehicles. For example, IBM’s TrueNorth Chip is a Complementary MOS (CMOS) ASIC made up of 5.4 billion transistors that make up 1 million neurons and 256 million synapses, all the while consuming only 73 mW.
2. Speed of computation.
Information in the brain is transmitted in the form of spikes propa-gating from layer to layer as soon as they are generated by a neuron. In contrast, conventional ANNs compute the output of neural layers sequentially, introducing significant delays while each layer waits for the output of the previous one to be computed. SNNs incor-porate time into their model of operation together with the state of neurons and synaptic weights, thus information flows in the form of spike trains propagating between neurons asynchronously. When combined with event-based sensors, this leads to pseudo-simultaneous information processing, where a first estimation of the output can be calculated at the output layer almost immediately after the introduction of the first input spikes .
3. Predisposition to bio-inspired unsupervised learning.
While supervised learning techniques have become the standard for ANNs, to this day, exact mechanisms of supervised learning in biology are still a mystery and presumably not the primary source of learning. The Hebbian process of learning proposed by Hebb in 1949 was the first attempt at explaining how learning occurs in the brain. The Hebbian plasticity principle, and the more compre-hensive version known as Spike Timing-Dependent Plasticity (STDP) , , entail that the change in the weight of a synapse is pro-portional to the timing between pre- and post-synaptic spikes. Since SNNs are designed to use spike timing to code information, it is due to make full use of these unsupervised biologically inspired learning algorithms.
4. Inherent robustness to noise.
As discussed in Section 22.214.171.124, spikes have a standard shape that holds no information, making the presence of a spike the most important aspect. Moreover, every neuron generates its own set of spikes instead of just passing the same signal along, hence there is a kind of regeneration of signals at every neuron . This gives SNNs an advantage similar to that observed in digital systems, that is their robustness to noise.
But with great promise comes great challenges. Despite all its promis-ing prospects, neuromorphic technology still faces many obstacles that need to be addressed before it can be implemented in real world applica-tions. The main obstacles that face neuromorphic technology today and stop it from realizing its full potential are discussed next.
1. Benchmark datasets in spiking form.
While it is counted as a drawback of SNNs that they cannot achieve high accuracy levels on typical benchmark datasets such as the MNIST handwritten digits dataset  or the CIFAR image dataset , the real issue is with the nature of these datasets. These datasets were created in a frame-based format which is fundamentally differ-ent from the way SNNs operate. The conversion of typical datasets into a spiking format is usually inefficient, since frame-based for-mat holds a huge amount of redundant data that is simply not present in event-based formats. In the recent years, a new direc-tion is being followed of directly collecting datasets in spiking format using Dynamic Vision Sensors (DVSs), such as the N-MNIST dataset (a neuromorphic version of the MNIST dataset) , the Dvs-Gesture dataset , along with other benchmark datasets that are optimized for visual recognition , .
2. Efficient spike-based learning algorithms.
Another major limitation of SNNs is the lack of efficient training algorithms that leverage the full potential of spike coding capabili-ties. Conventional ANNs have reached amazing performance levels using supervised learning techniques that are based on gradient-descent, such as the ubiquitous back propagation algorithm . However, applying back propagation directly to ANNs is infeasible since spikes are non-differentiable. Over the years, many approaches to training SNNs emerged. The first approach was the conversion of trained conventional ANNs into SNNs by adapting the weights and parameters, which offered SNNs the opportunity to benefit from the strength of deep learning but at the cost of less efficiency. On the other hand, multiple approaches have been proposed to use variants of the back propagation algorithm for direct training of SNNs –. Nonetheless, due to the complexity and asynchronic-ity of spike-based computing, the issue of SNN training algorithms remains a huge challenge for neuromorphic technology.
3. Efficient architectures.
There are various neuromorphic chips that exist today from multi-ple manufacturers and research labs, e.g., the TrueNorth chip from IBM , the SpiNNaker chip from the University of Manchester , Eyeriss from MIT , and Intel’s Loihi chip . With the ap-proaching end of Moore’s law, the scaling down of VLSI chips needs more innovative solution to allow the scalability of neuromorphic chips and ANNs to incorporate more neurons and synapses without compromising the power consumption. In a neural network, the number of synapses is usually much higher than that of neurons, which is why synaptic implementations are usually kept simple and a lot of effort is being directed into improving synaptic imple-mentations and finding better materials and techniques to achieve the synaptic function.
One of the biggest bottlenecks in ASIC implementations for neuro-morphic systems is the on-chip memory that holds the parameters and the synaptic weights. The standard technique used today is the on-chip or off-chip Static Random Access Memory (SRAM). However, they do not provide sufficient capacity to cope with the large num-ber of parameters and synaptic weights in ANNs. The typical SRAM density is 100 – 200 F2 per bit cell, where F is the technology node  with a capacity of a few megabytes on-chip. Emerging technolo-gies such as Non-Volatile Memory (NVM) devices, e.g., memristor crossbars, are attracting so much interest these days because of their analog nature and the capability of doing in-memory computations –.
4. Robust and reliable hardware.
In Chapter 1 we discussed the appeal of neuromorphic technology for safety-critical applications and how this made their testing a must. However, there is still a gap in the literature concerning testing techniques and fault-tolerance strategies applied for SNNs. This issue will be discussed in more detail in the rest of this chapter.
state-of-the-art in testing and fault tolerance in hardware neural networks
Despite all the major breakthroughs achieved by ANNs, their transfer into hardware unavoidably poses multiple factors that can degrade their performance. Perhaps the most important risk of hardware platforms is their susceptibility to faults either during manufacturing, such as physical defects and process-induced variations, or in the field due to environmental factors and aging. Consequently, fault detection and fault tolerance are essential to achieve better performance levels.
In this section, we review the concepts of fault tolerance and then we explore the body of work that has been carried out in the field of testing and fault tolerance for hardware neural networks.
Table of contents :
1.1.1 AI Hardware Accelerators
1.1.2 Neuromorphic Computing
1.1.3 Is AI Hardware Fault-Tolerant?
1.3.1 Fault Injection
1.3.2 Fault Modeling
1.3.3 Fault Tolerance
1.4 Thesis Structure
2 spiking neural networks & their fault tolerance: a literature review
2.1 Spiking Neural Networks
2.1.1 The Spiking Neuron
126.96.36.199 The Spike
188.8.131.52 The Membrane Potential
184.108.40.206 Spike Generation
2.1.2 Neural Coding Schemes
2.2 Spiking Neural Networks in Hardware
2.2.1 Neuron Models
2.2.2 Address Event Representation
2.3 Neuromorphic Technology Prospects and Challenges
2.4 State-of-the-Art in Testing and Fault Tolerance in Hardware Neural Networks
2.4.1 Fault, Error, and Failure
2.4.2 Fault Injection in the Literature
2.4.3 Fault Tolerance in the Literature
3 a self-testing analog spiking-neuron circuit
3.1 A Biological Perspective
3.2 The Biologically-Inspired Neuron Circuit
3.2.1 The Mathematical Model
3.2.2 The Neuron Circuit
3.3 The Built-In Self-Test
3.3.1 BIST Architecture
3.3.2 Expected BIST Response
3.3.3 BIST Verification
3.4 Results and Discussion
4 hardware-level fault modeling
4.1 Fault Simulation Framework
4.2 The Spiking Neuron
4.2.1 Behavioral Model
4.2.2 Transistor-Level Design
4.3 Spiking Neuron Faulty Behaviors
4.3.1 Catastrophic Faults
4.3.2 Parametric Faults
4.4 Behavioral-level Fault Model
5 fault injection and resiliency analysis in spiking neural networks
5.1 Fault Models
5.1.1 Neuron Fault Models
5.1.2 Synapse Fault Models
5.2 Case Studies
5.2.1 The Spike Response Model
5.2.2 Case Study (1): The N-MNIST SNN
220.127.116.11 The N-MNIST Dataset
18.104.22.168 The N-MNIST SNN Architecture
5.2.3 Case Study (2): The DVS-gesture SNN
22.214.171.124 The DVS-gesture Dataset
126.96.36.199 The DVS-gesture SNN Architecture
5.3 Fault Modeling & Injection Methodology
5.4 Fault Injection Experiments & Results: (1) The N-MNIST SNN
5.4.1 Neuron Faults
188.8.131.52 Dead Neuron Faults
184.108.40.206 Saturated Neuron Faults
220.127.116.11 Parametric Faults
5.4.2 Synapse Faults
18.104.22.168 Dead Synapse Faults
22.214.171.124 Saturated Synapse Faults
5.5 Fault Injection Experiments & Results: (2) The DVS-gesture SNN
5.5.1 Neuron Faults
126.96.36.199 Dead Neuron Faults
188.8.131.52 Saturated Neuron Faults
184.108.40.206 Parametric Faults
6 neuron fault tolerance
6.1 Passive Fault Tolerance Strategy
6.1.1 Training with Dropout
6.1.2 SNN Tolerance to Multiple Faults
6.2 Active Fault Tolerance Strategy
6.2.1 Active Fault Tolerance in the Output Layer
6.2.2 Active Fault Tolerance in the Hidden Layers
220.127.116.11 Offline Self-Test
18.104.22.168 Online Self-Test
22.214.171.124 Recovery Mechanisms
7 a spiking neural network hardware implementation
7.1 The Convolutional Node
7.1.1 The Convolutional Unit
126.96.36.199 Unit Parameters
188.8.131.52 The Convolution Operation
184.108.40.206 Global Leakage
220.127.116.11 Rate Saturation Mechanism
18.104.22.168 Output Event Generation & Traffic Control
7.1.2 The Router
7.1.3 The Configuration Block
7.2 The Experiment
7.2.1 The Poker-Card Symbols Dataset
7.2.2 The Convolutional SNN
7.2.3 The Experimental Setup & Results
7.3 Putting the Hardware in the Loop
8.1 Thesis Contributions
8.2 Future Perspectives