Linear Solutions to Higher Dimensional Interlayers (LSHDI)

Get Complete Project Material File(s) Now! »

Address Event Representation (AER)

Of paramount importance to any information processing system is the mechanism by which information is transmitted and received. This is of particular relevance in attempting to model elements of neuro-biological systems, as they boast a degree of point-to-point connectivity that is not feasible to implement directly with current technologies. Neuromorphic systems are often spike-based, and make use of a spike-based hardware technique known as Address-Event Representation (AER).
AER has become the standard neuromorphic interfacing protocol, specically for multi-chip neuromorphic systems [11] but also in terms of intra-chip communication in mixed-signal devices, and has proved to be a successful and powerful protocol for simulating large point-to-point connectivity, in which sparse events need to be communicated from multiple sources over a narrow channel [12]. A handshaking protocol allows multiple devices to share a common communication bus, and static timing between events is statistically preserved, due to the asynchronous nature of the event generation and the random order in which events occur [13].
Figure 2.1 shows a simple AER implementation, in which an arbiter on the transmitting device assigns a unique address to each of its neurons, and when a neuron spikes, an event is generated that contains its unique address and any additional information required. The AER protocol then transmits this spike serially over a communication bus to a receiver, which decodes the event and selects the appropriate destination. Multiplexing and arbitration only occur when the spike rate of the chip exceeds the transmission capacity of the communication link.
AER has been widely adopted for use in silicon retinas [1, 14, 15, 16, 17], neural processors [18, 19, 20, 21] and in silicon cochleas [22, 23, 24, 25, 26, 27].

Asynchronous Time-based Imaging Device (ATIS)

The data and the datasets used in this work make use of a specic DVS sensor known as the Asynchronous Time-based Imaging Device (ATIS), which contains both a TD and EM circuit for every pixel. Other imaging devices, such as the Dynamic and Active-Pixel Vision (DAVIS) camera [44], oer similar capabilities, but were not used in this work.
The ATIS is a CMOS dynamic vision and image sensor developed by the Austrian Institute of Technology (AIT) and draws inspiration from the data-driven nature of biological vision systems [45]. Unlike conventional camera methodologies, which rely on articially created timing signals that are independent of the source of the visual information [41], biological retinas do not produce frames or pixel values but rather encode visual information in the form of sparse and asynchronous spiking output.
The ATIS sensor oers a QVGA resolution with a 304 240 array of autonomously operating pixels, which combine an asynchronous level-crossing detector (TD) and an exposure measurement circuit (EM). These pixels do not output voltages or currents, but rather encode their output in the form of asynchronous spikes in the address-event representation (AER) [46].
Figure 2.2 shows the structure and function of an individual pixel and nature of their asynchronous event outputs. The change detection circuit outputs events in response to a change in illumination of a certain magnitude, and is also capable of triggering the exposure measurement circuit, which then generates two events with the absolute instantaneous pixel illumination encoded as the inter-spike timing between them.
This ability to couple the TD and EM circuits results in EM readings generated only in response to changes in the scene, providing a form of hardware level compression which allows for highly ecient video encoding. This is most signicant in slowly-changing scenes, where the majority of the illumination on each pixel remains constant.
In addition, the combination of change events from the TD and the two spikes emanating from the exposure measurement circuitry can be combined to produce a quicker gray-scale image approximation by using the inter-spike times between the TD event and the rst EM measurement [47]. This can then be later updated using the nal EM spike to produce a more accurate result. As each pixel operates autonomously, there is no need for a global exposure rate and therefore each pixel is able to optimise its integration time independently.
This results in a sensor with high dynamic range and improved signal-to-noise ratio. The asynchronous nature of the change detection also yields almost ideal temporal redundancy suppression and results in sparse encoding of the visual information in the scene.

Shape-based and Contour-based Features

Template, shape and contour matching techniques attempt to identify, detect and classify objects and features through unique edges and silhouettes. These techniques operate either at a global level, which might attempt to identify a human silhouette through matching a full body template, or at a local level, in which a human form may be detected by identifying a number of sub-parts to make up the whole. Global methods have the advantage of easily being able to detect multiple objects in one scene, but lack the robustness to occlusions and cluttered scenes that part-based methods can oer.
These gradient-based techniques have been applied in numerous dierent and varied scenarios, often combining multiple techniques or methods. For example, Garvrila tackled the problem of identifying pedestrians from a moving vehicle through the use of hierarchical template matching using a Chamfer detector to nd proper contours [51]. Lin et al. also used a hierarchical approach to human segmentation, but combined a global template and local parts approach, and additionally formulated the task as a Bayesian Maximum Posteriori (MAP) optimisation problem [52].
Another interesting example is the work of Ferrari et al., who tackled the problem of object detection in cluttered environments by extracting edges and tting contours, and using a map of their interconnects to perform detection [53]. Elements of this technique led to work of Anvaripour and Ebrahimnezhad, who constructed exact object boundaries for object detection, using boundary fragment extraction and using Gaussian Mixture Models (GMM) [54]. Wu and Nevatia extended the concept further by introduced edgelet features, which are short segments of a line or curve and used these in a joint-likelihood model, also formulated as a Maximum Posteriori probability (MAP) problem [55].
These approaches which make use of shapes or contours are particularly interesting in the context of event-based vision. Silicon retinas, such as the ATIS, perform an operation similar in nature to edge extraction for scenes with motion, ego-motion or photometric changes, but performs the computation at the hardware level and asynchronously. This creates the possibility of applying and extending these techniques into an event-based paradigm.

READ Optimal PGR rates for Phlox paniculata ‘Blue Boy’ and Rudbeckia hirta ‘Indian Summer’

Scale-Invariant Feature Transform (SIFT)

The Scale-Invariant Feature Transform (SIFT) is perhaps one of the most wellknown and widely-used feature detection methods. SIFT is a cascaded lter chain, which includes a scale-invariant detector and a rotationally invariant descriptor [56], and has proved to be eective and ecient across a wide variety of applications.
The SIFT detector belongs to a class of feature detectors that look for regions within an image that exhibit the properties of a good features – namely one that is either useful for recognition, classication or tracking purposes. The Moravec corner detector represents one of the rst attempts to detect such points through identifying points of local maximum or minimum intensity changes. The Harris Corner Detector [57] solved some of the issues surrounding sensitivity to noise and edges that aicted the Moravec detector.
The Harris Corner Detector proved to be a better means of identifying points of interest within an image, but lacked the ability to handle changes in scale or viewpoint. SIFT tackles this problem directly through the use of scale-space theory, and more specically linear scale-space representation [58], allowing it to detect features across a wide range of scales. The SIFT detector approximates the scale-space through a technique known as Dierence-of-Gaussians (DoG), and uses a 22 Hessian matrix of image gradients about each point in order to remove edge responses. The SIFT detector identies a set of keypoints for a given image, and then calculates a descriptor for each one. The descriptor needs to provide a consistent and reliable means of recognising the same feature in a dierent image or location, and the SIFT descriptor provides a rotationally-invariant and partially illumination-invariant descriptor calculated through the use of local orientation histograms, which are re-aligned to the dominant orientation.
The success of SIFT has led to numerous improvements and modications to the algorithm. Sukthankar made use of Principle Component Analysis (PCA) in order to reduce the size of the SIFT descriptors [59], Abdel-Hakim and Farag created a colour invariant version called CSIFT [60], and Scovanner et. al extended SIFT to handle 3D features [61].

Table of contents :

Contents
List of Figures
Nomenclature
Executive Summary
1 Introduction
1.1 Motivation
1.2 Aims
1.3 Main Contributions of this Work
1.4 Relevant Publications
1.5 Structure of this Thesis
2 Literature Review
2.1 Introduction to Computer Vision
2.2 Neuromorphic Imaging Devices
2.2.1 Neuromorphic Engineering
2.2.2 Address Event Representation (AER)
2.2.3 Silicon Retinas
2.2.4 Asynchronous Time-based Imaging Device (ATIS)
2.3 Feature Detection
2.3.1 Shape-based and Contour-based Features
2.3.2 Scale-Invariant Feature Transform (SIFT)
2.3.3 Histograms of Oriented Gradients (HOG)
2.4 Neuromorphic Approaches
2.4.1 Event-Based Visual Algorithms
2.4.2 Neuromorphic Hardware Systems
2.4.3 Spiking Neural Networks
2.5 Linear Solutions to Higher Dimensional Interlayers (LSHDI)
2.5.1 Structure of an LSHDI Network
2.5.2 The Online Pseudoinverse Update Method
2.5.3 The Synaptic Kernel Inverse Method
3 Event-Based Feature Detection
3.1 Introduction
3.2 Contributions
3.3 Feature Detection using Surfaces of Time
3.3.1 Surfaces of Time
3.3.2 Feature Detection on Time Surfaces
3.3.3 Time Surface Descriptors
3.4 Feature Detection using Orientated Histograms
3.4.1 Introduction to Circular Statistics
3.4.2 Mixture Models of Circular Distributions
3.4.3 Noise Filtering through Circular Statistics
3.4.4 Feature Selection using Mixture Models
3.5 Discussion
4 Event-Based Object Classication
4.1 Introduction
4.2 Contributions
4.3 Spiking Neuromorphic Datasets
4.3.1 MNIST and Caltech101 Datasets
4.3.2 Existing Neuromorphic Datasets
4.3.3 Conversion Methodology
4.3.4 Conclusions
4.4 Object Classication using the N-MNIST Dataset
4.4.1 Classication Methodology
4.4.2 Digit Classication using SKIM
4.4.3 Error Analysis of the SKIM network Result
4.4.4 Output Determination in Multi-Class SKIM Problems
4.4.5 Analysis of Training Patterns for SKIM
4.4.6 Conclusions
4.5 Object Classication on the N-Caltech101 Dataset
4.5.1 Classication Methodology
4.5.2 Handling Non-Uniform Inputs
4.5.3 Revising the Binary Classication Problem with SKIM
4.5.4 Object Recognition with SKIM
4.5.5 5-Way Object Classication with SKIM
4.5.6 101-Way Object Classication with SKIM
4.5.7 Conclusions
4.6 Spatial and Temporal Downsampling in Event-Based Visual Tasks
4.6.1 The SpikingMNIST Dataset
4.6.2 Downsampling Methodologies
4.6.3 Downsampling on the N-MNIST Dataset
4.6.4 Downsampling on the SpikingMNIST Dataset
4.6.5 Downsampling on the MNIST Dataset
4.6.6 Downsampling on the N-Caltech101 Dataset
4.6.7 Discussion
5 Object Classication in Feature Space
5.1 Introduction
5.2 Contributions
5.3 Classication using Orientations as Features
5.3.1 Classication Methodology
5.3.2 Classication using the SKIM Network
5.3.3 Classication using an ELM Network
5.3.4 Discussion
5.4 Object Classication using Time Surface Features
5.4.1 Classication Methodology
5.4.2 Adaptive Threshold Clustering on the Time Surfaces
5.4.3 Surface Feature Classication with ELM
5.4.4 Classication using Random Feature Clusters
5.4.5 Surface Feature Classication with SKIM
5.4.6 Discussion
6 Conclusions
6.1 Validation of the Neuromorphic Datasets
6.2 Viability of Event-Based Object Classication
6.3 Applicability of SKIM and OPIUM to Event-Based Classication
6.4 The Importance of Motion in Event-Based Classication
6.5 Future Work
References
Appendix A: Detailed Analysis of N-MNIST and N-
Appendix B: Optimisation Methods for LSHDI Networks
Appendix C: Additional Tables and Figures