Get Complete Project Material File(s) Now! »
Machine learning is a field of computer science focused on creating software that uses data to learn how to accomplish a task. Rather than following explic-itly stated rules, machine learning software looks for structure in data in order to shape its behaviour . Certain tasks which may appear to be trivial, such as identifying handwritten digits, are surprisingly diﬃcult to perform using traditional techniques. A machine learning approach can be used to perform many such tasks on an appropriate dataset .
One of the two main classes of machine learning is called supervised learning. In supervised learning the data provided to the software is labelled with the desired output . For example, while using machine learning to recognise handwritten digits, an image of a seven is labelled as a seven. In this way the software receives feedback determining how well it is performing so that it may adjust itself accordingly.
Artificial neural networks
Artificial neural networks (ANNs) is one form of machine learning which is based on how the human brain work. Our brain is built with neurons which are connected together. These neurons receive electrochemical signals from other neurons. Depending on the origin of the charges, their magnitude, and how the neuron is tuned it might send out a new charge to other neurons [14, 2, 18].
The equivalent of a neuron in an ANN is a node. Each node receives values from a range of inputs. The node is comprised of weights, a bias and an acti-vation function . The number of weights equals the number of inputs i.e. they are mapped one weight to one input value. The output value is calculated using the activation function (a continuous mathematical function) with the sum of each input multiplied with its respective weight plus the bias as the argument [18, 8].
An ANN is, just like the brain, built with multiple nodes containing their own values for the weights and biases. It has to consist of a number of input nodes and at least one output node. Note that the input nodes are simpler in compar-ison to all other nodes since all they do is pass their value to every node in the next set of nodes. The most common way of structuring the nodes is in layers where the simplest ANN, called a single-layer ANN (see figure 2.1), has one input layer connected directly to the output layer .
Neural network training
In order for the output, also called prediction, of an ANN to be accurate it has to be trained. Simply put all the weights and biases have to be calibrated. There is some variation in how this can be done, however the fundamental part stays the same. First the available data is split into a set of training data and a set of testing data [17, 8]. After this the weights and biases are initialised to random values. The ANN is fed with data from the training set to get a measure of how far from the correct output the prediction was. This measured error is then used to update the weights and biases after which the process is repeated until desired accuracy is reached .
For a single-layer ANN one approach to use is the delta rule. It includes a learning rate (a continuous value, usually, between 0 and 1), the derivative of the activation function, the input value as well as the diﬀerence between the expected output and the actual output . The learning rate determines how quickly we approach a more accurate ANN. Setting the learning rate to 1 or higher introduces a risk that the learning will not converge towards the solu-tion. However, setting it too low will make the training very time consuming [8, 9].
There are diﬀerent methods that vary in when and how often the weights and biases are updated. Updating the weights using only a single random datapoint from the training set each time is called stochastic gradient descent. A slower but a lot more stable version is the batch-method. It runs through the entire training dataset and uses the average of the calculated changes to update the weights only once. The last version is mini-batch, a combination of the pre-vious two, which splits the training dataset into smaller batches on which the batch method is used.
Since training only gets us closer to a solution with each update we can use the same training data multiple times. This is common and therefore one complete run through the training data has been given the name epoch [8, 17].
Deep neural networks
Single layer ANNs are only able to model linear structure, therefore it is some-times necessary to introduce ANNs with layers in between the input and out-put. ANNs with multiple such hidden layers are known as deep neural net-works (DNN). Apart from these extra layers, DNNs are constructed in the same way as single layer ANNs. The output of the first layer becomes the input of the next and so on (see figure 2.2). Because the calculation of the output of each node involves a non-linear activation function, DNNs can learn non-linear structure in the data they are provided. As a result, DNNs allow for modelling of more complex structures compared to single layer networks . Intuitively this is because deeper layers learn higher level features, which are combinations of the lower level features of the previous layers. For example, the first layer in a digit recognition network may learn to recognise line seg-ments, while the second combines these to recognise longer lines and loops. Finally the last layer combines these high level features to learn to recognise actual digits. Note that the hidden layers are not necessarily this understand-able, this example is only used for illustrative purposes.
Recent advances within machine learning and the increased availability of computing power (especially GPUs) have led to the field of deep learning us-ing DNNs becoming increasingly popular. Because of their ability to learn complex structures, DNNs are often used in areas like language modelling and computer vision .
The diﬃcult part of training a DNN was to define the error in the nodes of the hidden layer(s) in order to know how adjust their weights. This problem was solved in 1989 when back propagation was introduced. The basic idea is to work from the output layer going back one layer a time towards the input layer. In each step backwards we calculate how we want the input from each of the nodes in the previous layer to change, both in what direction and by how much, with the generalised delta rule. After the values are calculated for all nodes in one layer we backpropagate and, for each node in this previous layer, sum up the value multiplied by the weight corresponding to that value. This summed up value is now the error of the node in the hidden layer and doing the same thing recursively we will eventually get back to the input layer .
The activation function
The activation function plays a significant role in training the ANN and thus also in its final accuracy. Over the past 20 years the preferred activation func-tion has changed multiple times. In chronological order some of the most popular ones are:
• Linear: ‘(x) = ax + b ( 0 for x < 0
• Sigmoid: ‘(x) = 1 x 1+e
• ReLU: ‘(x) = max(0, x)
The technique of creating an ANN which has a single continuous output is called regression. Regression analysis is a field within statistics focused on finding relationships between dependent and independent variables . More specifically, given a dataset containing measurements of one or more indepen-dent variables and one dependent variable, regression is used to estimate a con-tinuous function using these variables as its domain and range, respectively. For example, one might estimate a function whose inputs are the features of a car and output is the car price.
In regression analysis, an appropriate type of function is selected (guessed) beforehand and then parameterised. Subsequently, some method of measuring how well the function matches the data is selected and used to determine the optimal parameters.
When producing statistical models from a dataset there is a risk known as overfitting. Overfitting means that the model fits the training data very closely, but fails to accurately predict previously unseen data . This phenomena can occur when the structure of the chosen model is more complex than the structure of the underlying data it is modelling. A simple example is the use of a quadratic polynomial to model a phenomena which is actually linear in nature. If the model was created from two points, the quadratic polynomial would be inaccurate for every other point except these two.
In machine learning, overfitting can occur for a number of diﬀerent reasons. One common cause is training a neural network for too long. Another cause is using too many or too large hidden layers in a DNN, which can be likened to the example of using a quadratic polynomial when modelling a linear phenomena. There is also potential for overfitting when the training dataset is too small, as this will cause the ANN to learn features which may not be representative of the data in general .
TensorFlow is an open-source machine learning framework developed by the team at Google Brain. It is the successor to their previous machine learning framework DistBelief, and was developed with the intention of being more flexible. Both TensorFlow and DistBelief use dataflow graphs to represent machine learning models, but the diﬀerence is that TensorFlow takes a more general approach by having the nodes in the graph be simple mathematical operations. According to the team at Google, this encourages experimentation through its simple high-level scripting interface . As such it is possible for advanced users to tailor models to suit their needs. There is also support for easily creating simple commonly used models such as regression DNNs, which is what we are using.
Table of contents :
1.1 Problem Statement
2.1 Machine learning
2.1.1 Supervised learning
2.2 Artificial neural networks
2.3 Neural network training
2.4 Deep neural networks
2.4.2 The activation function
3.1 Measurement of error
3.2 Gathering data
3.2.1 Fixed parameters
3.2.2 Variable parameters
3.2.3 Collection script
3.2.4 TensorFlow installation
3.3 Our deep neural network
3.3.1 Finding an accurate configuration
3.3.2 Training and evaluation sets
4.1 Time variation distribution
4.1.1 An upper limit on accuracy
4.2 Looking for patterns
4.3 Resulting neural network
5.1 Pre-processing data
5.2 Splitting the data
5.3 Future research