Improving a deep convolutional neural network architecture for character recognition

Get Complete Project Material File(s) Now! »

Factorized matrix vector multiplication

The previously introduced matrix vector multiplication is the most popular linear operation in neural networks, but it can be quite expensive in computational time and memory. For a vector of size N and a matrix of size N N, the computational and memory complexity are O(N2). In this subsection we brie y present several methods which factorize the matrix in the matrix vector multiplication, in order to reduce the computational and / or memory complexity. We introduce a similar approach, which factorizes the matrix (and uses quantum computation) in Chapter 6. For this reason, we focus here on some of the methods most similar and relevant to our own.
[147] replaced the W matrix in fully-connected layers with the matrix product A C D C, with A and D diagonal matrices, C the discrete cosine transform and C􀀀1 the inverse discrete cosine transform, reducing the computational complexity of a to O(N log(N)) and the number of trainable parameters to O(N), while maintaining comparable statistical performance for the task of object recognition on the ImageNet dataset.
[38] proposed a similar factorization, with O(Nlog(N)) computational complexity and O(N) trainable parameters hidden-to-hidden transform of a recurrent neural network (see Section 1.3.3.1). The resulting transform is the product of multiple unitary matrices, some of which represent the Discrete Fourier Transform and the Inverse Discrete Fourier Transform. The RNN parameterization obtained state of the art results at the time of the proposal on several long-term dependency tasks. Our proposal in chapter 6 also decomposes the matrix W into a product of multiple unitary matrices, some of which represent Fourier transforms, but potentially reduces the computational and memory complexities even further, due to the use of quantum computation.

Nonlinear activation functions

All previously described computational primitives are linear. Nonlinear operations are also necessary, otherwise a machine learning system containing only linear operations would not be expressive enough, no matter how many linear operations were composed. Intuitively, no matter how many linear operations are composed, the entire system is no more powerful than a simple linear regression. On the other hand, even the composition with a single nonlinear operation makes neural networks universal approximators of continuous functions [73].
In neural networks, nonlinearity is introduced using the concept of an activation function, which is applied element-wise to the input.

Articial neural network architectures

Articial neural networks (ANNs) are the main topic of this thesis. As their name suggests, ANNs represent machine learning architectures (very loosely) inspired by their biological counterparts.
The basic computational unit in ANNs is the articial neuron. The rst celebrated model of an articial neuron was introduced in 1943 by [144]. We will denote the output of a neuron’s computation by the term activation. In biology, neurons are connected by synapses with varying strengths (weights) and each neuron hasa threshold (whose electrical activity, which could be roughly interpreted as the activation in our description, has to be greater than for the neuron to re). Articial neurons have corresponding (and simplied) weights w connecting them to other neurons and bias b (the simplied equivalent of the biological threshold). To connect this model to the machine learning framework, the weights w and bias b are trainable parameters (whose values are optimized during the learning process). The most basic widely-encountered modern articial neurons (as they appear especially in multilayer perceptrons, which will be described in the following subsection) compute a dot product, followed by an element-wise nonlinearity (such as those described in Section 1.2.2). Given a set of inputs x, the activation h of a neuron connected to the inputs using weights w and bias b is: h = f(w x + b)

READ

Table of contents :

Acknowledgments
Resume
Abstract
Introduction
1 Introduction to deep learning
1.1 Short introduction to machine learning
1.2 Computational primitives
1.2.1 Matrix vector multiplication
1.2.1.1 Element-wise multiplication
1.2.1.2 Convolution
1.2.1.3 Factorized matrix vector multiplication
1.2.2 Nonlinear activation functions
1.3 Articial neural network architectures
1.3.1 Multilayer perceptrons (MLPs)
1.3.1.1 Input layer
1.3.1.2 Hidden layers
1.3.1.3 Output layer
1.3.1.4 Representational power
1.3.2 Convolutional neural networks (CNNs)
1.3.2.1 Convolutional layer
1.3.2.2 Subsampling layer
1.3.2.3 Output layer for classication
1.3.2.4 CNN design patterns
1.3.3 Recurrent neural networks (RNNs)
1.3.3.1 Standard RNNs
1.3.3.2 Long short-term memory (LSTM)
1.3.3.3 Bidirectional RNN (BRNN)
1.4 Performance measures
1.4.1 Label error rate (LER)
1.4.2 Character error rate (CER)
1.4.3 Word error rate (WER)
1.5 Gradient-based optimization
1.5.1 Loss functions
1.5.1.1 Cross-entropy
1.5.1.2 Connectionist temporal classication (CTC)
1.5.2 Gradient descent
1.5.2.1 Finite dierences
1.5.2.2 Simultaneous perturbation stochastic approximation (SPSA)
1.5.2.3 Backpropagation
1.5.2.4 Backpropagation through time (BPTT)
1.5.2.5 Vanishing / exploding gradients
1.5.3 State of the art optimization algorithms and heuristics
1.5.3.1 ADAM optimization
1.5.3.2 Gradient clipping
1.5.4 Helpful methods for optimization / regularization
1.5.4.1 Dropout
1.5.4.2 Batch normalization
1.5.4.3 Early stopping
1.6 Conclusion
2 Deep learning-based handwriting recognition
2.1 The role of handwriting recognition tasks in the history of neural networks
2.1.1 MNIST for classication
2.1.2 Other tasks and datasets
2.1.2.1 MNIST for benchmarking generative models
2.1.2.2 Pixel by pixel MNIST
2.1.2.3 Recognizing multilingual handwritten sequences
2.1.2.4 Online handwriting sequential generative models
2.2 The history of neural networks applied to handwriting recognition
2.2.1 Datasets
2.2.1.1 IAM
2.2.1.2 RIMES
2.2.1.3 IFN-ENIT
2.2.2 Deep neural networks (DNNs)
2.2.3 Recurrent Neural Networks (RNNs)
2.2.4 Architectures mixing convolutional and recurrent layers
2.3 Conclusion
3 Improving a deep convolutional neural network architecture for character recognition
3.1 Architecture
3.2 Nonlinear activation functions
3.3 Gradient-based optimization and loss function
3.4 Initialization
3.5 ADAM variant
3.6 Dropout
3.7 Batch normalization
3.8 Early stopping
3.9 Experiments
3.10 Conclusions
4 Tied Spatial Transformer Networks for Digit Recognition
4.1 Common elements
4.1.1 Convolutional architectures
4.1.2 Activation functions and parameter initialization
4.1.3 Loss function and optimization
4.1.4 Regularization
4.2 Experiments
4.2.1 CNN, STN and TSTN comparison
4.2.2 The regularization hypothesis
4.3 Discussion
4.4 Conclusion
5 Associative LSTMs for handwriting recognition
5.1 Methods
5.1.1 Holographic Reduced Representations
5.1.2 Redundant Associative Memory
5.1.3 LSTM
5.1.4 Associative LSTM
5.2 Results
5.2.1 Dataset
5.2.2 Image normalization
5.2.3 System details
5.2.4 Results
5.3 Discussion
5.4 Conclusion
6 Hybrid classical-quantum deep learning
6.1 Motivation for using quantum computing
6.2 Introduction to the quantum circuit model of quantum computing with discrete variables
6.2.1 The qubit
6.2.1.1 Multiple qubits
6.2.2 Unitary evolution
6.2.2.1 Single qubit
6.2.2.2 Multiple qubits
6.2.3 Measurement
6.2.3.1 Full measurement
6.2.3.2 Partial measurement
6.3 Discrete-variable quantum neural networks using partial measurement
6.4 Introduction to hypernetworks
6.5 Proposed parameterization
6.5.1 Quantum (main) RNN
6.5.2 Output layer
6.5.3 Loss function
6.5.4 Classical (hyper) RNN
6.6 Simulation results
6.6.1 Task and dataset
6.6.2 System details
6.6.2.1 Baseline classical LSTM
6.6.2.2 Hybrid classical-quantum parameterization
6.6.2.3 Common settings
6.6.3 Approximate computational cost
6.6.4 Accuracy estimation under -sampling approximation
6.7 Experimental results
6.8 Discussion
6.9 Conclusion
Conclusions
Publications