Artificial to Spiking Neural Networks Conversions

Get Complete Project Material File(s) Now! »

Artificial Neural Network

ANNs are a family of models that are able to learn directly the features but also the class boundaries from the data. Indeed, multi-layer neural networks, some-times called multi-layer perceptrons (MLPs), have the ability to learn intermediate representations. Thanks to their non-linearity, they can learn to extract low-level features in their first layers, and increase the complexity and the abstraction of the features across the layers. Finally, the last layers can behave like a classifier, which allows having a unique model to process images. These models can be expressed as ec(.; Φ), where Φ is a set of parameters that can be optimized towards a specific goal by a learning algorithm. Φ can be optimized by minimizing an objective function obj. In a supervised task, this optimization step can be expressed as: works. BP allows the efficient computation of the gradient of all the operations of the network thanks to the chain rule formula. Then, those gradients are used by an optimization method to minimize a loss function obj. This metric gives the error between the actual predicted value and the expected value. Gradient descent (GD) is an optimization algorithm which uses all the training examples to update the parameters. When used with suitable values of the meta-parameters (e.g. a small learning rate), this method may find a smaller or equal loss after each step. However, this method has the disadvantage of being very expensive to compute and it can get stuck in local minima. Stochastic gradient descent (SGD) is another optimization method that uses only one image per update step, which reduces the risk of getting stuck in local minima by constantly changing the loss obj. However, SGD gives a stochastic approximation of the cost gradient over all the data, which means that the path taken in the descent is not direct (i.e. a zig-zag effect happens). Finally, a compromise exists between the SGD and GD methods: it averages the updates of multiple samples (defined by the batch size) to improve the robusness of SGD without the drawbacks of GD. Finally, mini-batch gradient descent is a compromise between the two previous methods, using a subset of samples per update.

Object Recognition Datasets

Since most of the models are trained on a sample collection to infer a generalization, a particular attention should be given to the data. For example, if the complexity of the data is too high according to the number of available samples, finding a correct model is hard. Also, if the data are biased, the trained model has serious risks to learn this bias and so to generalize poorly. Nowadays, gathering a huge number of various images is no longer a real problem, thanks to the search engines and social networks. However, getting correct labels still requires efforts in the case of supervised learning. Labeling can be a laborious work since it requires experts on complex tasks, and it needs to be done by hand to prevent as much as possible errors. Multiple datasets, with different specificities, exist to compare the different methods used in image classification. While some datasets try to constrain the images in order to limit the variations and so, the classification difficulty, other datasets aim to be very heterogeneous in order to better represent in the real-world contexts. The number of samples available for each class is an important criterion since using more images allows improving the generalization of the model. Thus, trivial datasets, (i.e. toy datasets) can be limited to a few hundreds or a few thousands of images, while more complex datasets generally contain millions of images. A second important criterion is the number of classes. Working with more classes tends to make the problem more difficult. Some datasets contain only a set of object classes while others contain a hierarchy of classes, with abstract concepts gathering multiple object classes. While some datasets provide only the label associated with each image, others provide more information, such as a multiple keywords, bounding boxes of the different objects present in the images, or the segmentation of the pixels belonging to the objects. An example of a non-challenging dataset is Modified-NIST (MNIST) [65]. MNIST consists of 60,000 training samples and 10,000 test samples. Each image is a grayscale, centered, and scaled handwritten digit of 28 × 28 pixels, which limits the variations to be taken into account (see Figure 2.6). The dataset has 10 classes, which are the digits from 0 to 9. Variants of this dataset exist, to test models on different properties. As an example, a permutation-invariant version exists (PI-MNIST) which prevents the usage of the spatial relationships between the pixels. Sequential-MNIST is another variant which consist to get one pixel at time [66]; it is notably used in recurrent approaches in order to evaluate the short-term memory of the network. NORB [67] and ETH-80 [ 68] are other toy datasets, which provide a few images of some objects. Again, the variations are limited (all the objects are nearly centered and scaled, lighting conditions are good…). Only a few tens or hundreds of samples are provided, which is not really a difficulty because of the low complexity of these datasets. This kind of dataset is no longer used nowadays in the computer vision literature, because recognition rates are already very close to the maximum reachable (e.g. 99.82% on MNIST, see Table 2.1). The range of applications with such data remains also limited, since models trained on them work correctly only if the same constraints are present. However, these datasets are still useful for testing methods that are ongoing research efforts, like those detailed in the rest of this manuscript, because it limits the difficulty and allows to quickly prototype before starting to work on more complex tasks.

READ  Hyper-parameter Optimization of Neural Networks 

Overview of Spiking Neural Networks

Despite their state-of-the-art performances on multiple tasks, ANNs also have some drawbacks. One of the most problematic is the energy efficiency of the models. As an example, deep neural networks require hundreds or even thousands of Watts to run on a classic architecture. Even tensor processor unit (TPU), which are optimized to simulate neural networks, consume about a hundred Watts [84]. In comparison, the brain uses about 20 W as a whole. A second issue is the supervision. Current unsupervised methods are far behind the capacity of the brain. Studies of models of an intermediate abstraction level, between the precise biological neural networks and the abstract artificial neural networks, aim to overcome these limitations. This family of neural network models, SNNs, uses a mode of operation closer to biology than ANNs, in order to benefit from its advantages, while allowing simpler implementation [85], [86]. The main difference between ANNs and SNNs is their mode of communication. ANNs behave like a mathematical function: they transform a set of input numerical values into another set of output numerical values (see Figure 9a). Although this model of operation can be easily implemented on von Neumann architectures, the constraints of such models, like the need for synchronization, make them difficult to be efficiently implemented on dedicated architectures. In contrast, SNNs use spikes as the only communication mechanism between network components (see Figure 2.9b). These spikes, whose principle comes directly from biology, allow a complete desynchronization of the system, because each component is only affected by the incoming spikes. Depending on the model, each spike can be defined by a set of parameters. In its simplest form, a spike can be considered as a binary event, which means that the intensity or the shape of the impulse is neglected. Thus, the only parameter is , the timestamp of the spike. A second parameter, the voltage exc, can be added to define a spike in some models. However, using spike computation prevents the usage of traditional learning methods, which are value-based. New methods need to be introduced in order to train SNNs. Despite the fact that the performances in terms of classification rate of these models are currently behind ANNs, the theory shows that SNNs should be more computationally powerful than their traditional counterparts [87], which means that SNNs should be able to compete with ANNs.

Table of contents :

1 Introduction, context, and motivations 
1.1 Neuromorphic Hardware
1.2 Neuromorphic Constraints
1.3 Motivations
1.4 Outline
2 Background 
2.1 Object Recognition
2.1.1 Feature Extraction
2.1.2 Classification Methods
2.1.3 Artificial Neural Network
2.1.4 Object Recognition Datasets
2.2 Overview of Spiking Neural Networks
2.2.1 Spiking Neurons
2.2.2 Topology
2.2.3 Neural Coding
2.2.4 Synapses
2.2.5 Inhibition
2.2.6 Homeostasis
2.3 Image Recognition with SNNs
2.3.1 Pre-Processing
2.3.2 Artificial to Spiking Neural Networks Conversions
2.3.3 Adapted Back-propagation
2.3.4 Local Training
2.3.5 Evolutionary Algorithms
2.4 Software Simulation
2.4.1 Event-Driven vs Clock-Driven Simulation
2.4.2 Neuromorphic Simulators
2.5 Conclusion
3 Software Simulation of SNNs 
3.1 N2S3
3.1.1 Case study: motion detection
3.1.2 Comparison of the Three Approaches
3.1.3 Energy Consumption
3.1.4 Conclusion
3.3 Conclusion
4 Frequency Loss Problem in SNNs 
4.1 Mastering the Frequency
4.1.1 Target Frequency Threshold
4.1.2 Binary Coding
4.1.3 Mirrored STDP
4.2 Experiments
4.2.1 Experimental Protocol
4.2.2 Target Frequency Threshold
4.2.3 Binary Coding
4.2.4 Mirrored STDP
4.3 Discussion
4.4 Conclusion
5 Comparison of the Features Learned with STDP and with AE 
5.1 Unsupervised Visual Feature Learning
5.2 STDP-based Feature Learning
5.2.1 Neuron Threshold Adaptation
5.2.2 Output Conversion Function
5.2.3 On/Off filters
5.3 Learning visual features with sparse auto-encoders
5.4 Experiments
5.4.1 Experimental protocol
5.4.2 Datasets
5.4.3 Implementation details
5.4.4 Color processing with SNNs
5.4.5 SNNs versus AEs
5.5 Result Analysis and Properties of the Networks
5.5.1 On-center/off-center coding
5.5.2 Sparsity
5.5.3 Coherence
5.5.4 Objective Function
5.5.5 Using Whitening Transformations with Spiking Neural Networks
5.6 Conclusion
6 Training Multi-layer SNNs with STDP and Threshold Adaptation 
6.1 Network Architecture
6.2 Training Multi-layered Spiking Neural Networks
6.2.1 Threshold Adaptation Rule
6.2.2 Network Output
6.2.3 Training
6.3 Results
6.3.1 Experimental protocol
6.3.2 MNIST
6.3.3 Faces/Motorbikes
6.4 Discussion
6.5 Conclusion
7 Conclusion and future work 
7.1 Conclusion
7.2 Future Work
7.2.1 Simulation of Large Scale Spiking Neural Networks
7.2.2 Improving the Learning in Spiking Neural Networks
7.2.3 Hardware Implementation of Spiking Neural Networks
A Code Examples 
A.1 N2S3
A.2 N2S3 DSL
A.3 CSNN Simulator
B Acronyms 
C List of Symbols 
Generic Parameters
Neuron Parameters
Synapse Parameters
Neural Coding Parameters
Network Topology
Pre-Processing Parameters
Image Classification
Energy Notations
Spike Parameters .


Related Posts