Statistically optimizing the number of hidden nodes of an artificial neural network

Get Complete Project Material File(s) Now! »

Statistical modelling versus artificial neural networks

The fields of Statistics and artificial neural networks are in many respects closely related. Both disciplines developed methodologies with the aim to learn, or predict, from examples. There are, however, still conflicting opinions among statisticians on the usefulness of artificial neural networks for statistical inference. Many are sceptical of the empirical approach of artificial neural network research where algorithms are developed for solving a particular application problem. This is in contrast with statistical research where implementation is often a secondary issue to 18 the theoretical assumptions underlying the analysis and is one of the main differences between Statistics and artificial neural network research methodology. Another difference between the two is that Statistics has historically developed to deal with linear problems, while artificial neural networks are designed to specifically address nonlinearities where large volumes of data are available but little is known about the complicated relationship between the inputs and outputs.

Warren McCullogh and Walter Pitts

Warren McCullogh and Walter Pitts explored the computational capabilities of network models with a very simple design during the middle decades of the previous century. Their pioneering publication, « A Logical Calculus of the Ideas Immanent in Nervous Activity » in which they outlined the first formal model of an elementary computing neuron in 1943, is generally regarded as the genesis of the development of artificial neural network systems [McCulloch & Pitts, 1943]. In this paper McCullogh and Pitts presented the first sophisticated discussion of « neuro-logical networks » and stated the doctrine and many of the fundamental theorems of their axiomatic representation of neural elements explicitly. The paper caused considerable excitement amongst scientists and spurred a flurry of interest in artificial neural network systems.

Donald Hebb

The psychologist Donald Hebb designed the first learning law for artificial neural networks. In « The Organization of Behaviour », published in 1949 [Hebb, 1949], he proposed a learning scheme for updating neurons’ connections that had a considerable impact on future developments in the field. His was the first attempt to base a large-scale theory of psychology on suppositions about artificial neural networks. Based on the biological discovery that a synapse’s resistance to an incoming signal is changed metabolically during a « learning » process, Hebb showed that 26 networks might learn by storing information in connections by constructing so-called cell-assemblies: subfamilies of neurons which are frequently activated together become linked into a functional organization and thus learn to support each other’s activities.

Frank Rosenblatt

Frank Rosenblatt formally introduced the neuron-like element called a perceptron towards the end of this decade. In his 1958 paper [Rosenblatt, 1958] and subsequent book [Rosenblatt, 1962] he criticized the lack of randomness and the inflexibility of existing artificial neural network models compared to biological neural networks. His research investigated a simple brain model emulating the physical structures and neurodynamic principles which underpin intelligence. Various different types of brain models had so far been proffered by scientists ranging from philosophers, psychologists, biologists and mathematicians to electrical engineers ([Hebb, 1949], [Minsky, 1954], [Von Neumann, 1958]). Rosenblatt unique contribution was that he proposed a theory of statistical separability based on probability theory, rather than symbolic logic, to develop a class of network models known as perceptrons and formulated his Perceptron Convergence theorem ([Rosenblatt, 1962], pp.109-116).

Bernard Widrow and Marcian Hoff

During the early 1960s another powerful learning rule, called the Widrow-Hoff learning rule, was developed by Bernard Widrow and Marcian Hoff ([Widrow & Hoff, 1960], [Widrow, 1962]). (The Widrow-Hoff learning rule is also referred to as the least mean squares (LMS) rule in technical literature.) 30 The ADALINE (ADAptive Linear NEuron), which is explained in more detail in Section 4.4.2.2, is not a network but a single neuron that produces an output based on a pattern of inputs, like the perceptron. The rule is closely related to the perceptron learning rule. While the perceptron rule adjusts the connection weights to a unit whenever the response of the unit is incorrect, the AD ALINE’s learning method incorporates supervised learning ( cf. Section 4.4.2) where the network is given feedback indicating not only whether the output is incorrect, but also what the output should have been.

Marvin Minsky and Seymour Papert

The final publication of this era was the book « Perceptrons » by Marvin Minsky and Seymour Papert in 1969 [Minsky & Papert, 1988 c.1969]. In this text Minsky and Papert evaluated the perceptron as the simplest learning machine, i.e. as a class of computations (parallel-machine architectures) that make decisions by weighing evidence. Up to this stage many experiments with perceptrons have taken place, but nobody has been able to satisfactorily explain why perceptrons were able to learn to recognize certain kinds of patterns but not others. Minsky and Papert revealed some fundamental limitations of loop-free connectionist learning machines and proved that one-layer perceptrons were incapable of learning to distinguish classes of patterns that were not linearly separable, using the well-known logical EXCLUSIVE-OR (XOR) function to illustrate the weakness of the perceptron.

Contents :

1 Prelude
2 Statistical modelling and artificial neural networks
- 2.1 Statistical modelling
- 2.2 Predictive learning
- 2.3 Artificial neural networks
- 2.4 Statistical modelling versus artificial neural networks
3 Historical development of artificial neural networks
- 3.1 Warren McCullogh and Walter Pitts
- 3.2 Donald Hebb
- 3.3 Marvin Minsky
- 3.4 Frank Rosenblatt
- 3.5 Bernard Widrow and Marcian Hoff
- 3.6 Marvin Minsky and Seymour Papert
- 3.7 During the 1970s and early 1980s
  - 3.7.1 Japanese scientists
  - 3.7.2 Kohonen
  - 3.7.3 Anderson; Grossberg and Carter
  - 3.7.4 Simulated annealing and the Boltzmann machine
- 3.8 John Hopfield
4 Artificial neural networks
- 4.1 Data
  - 4.1.1 Data collection and auditing
    - 4.1.1.1 Underfitting and overtraining of an artificial neural network
    - 4.1.1.2 Data auditing
  - 4.1.2 Data preprocessing
  - 4.1.3 Data encoding
  - 4.1.3.1 Input variables
- 4.4 Training algorithms
  - 4.4.1 Self-supervised training
    - 4.4.1.1 Hebb learning rule
  - 4.4.2 Supervised training algorithms
    - 4.4.2.1 Perceptron
    - 4.4.2.2 ADALINE
    - 4.4.2.3 Delta rule
  - 4.4.3 Unsupervised training algorithms
    - 4.4.3.1 Kohonen self-organizing maps
    - 4.4.3.2 Adaptive resonance theory
4.5 Backpropagation of error in a multilayer feedforward artificial neural network
- 4.5.1 The backpropagation algorithm
- 4.5.2 Training errors
- 4.5.3 Derivation of the learning rule
- 4.5.4 Initial weights and bias choices

5 Statistically optimizing the number of hidden nodes of an artificial neural network
- 5.1 Problem setting
  - 5.1.1 A feedforward artificial neural network in a nonlinear regression setting
  - 5.1.2 Accuracy criteria
- 5.2 Statistical theory and method
  - 5.2.1 Statistical decision-making
  - 5.2.2 Fisher information matrix
  - 5.2.3 Standard linear regression model
  - 5.2.4 Singular value decomposition
- 5.3 Optimization algorithm
6 Modelling the NPRP data
- 6.1 The data set
- 6.2 Data auditing
- 6.3 Discriminant analyses
  - 6.3.1 Enter: rmll; … ; rml7; rmcl; … ; rmc
  - 6.3.2 Stepwise: rmll; . .. ; rml7; rmcl; .. . ; rmc7 at 5% to enter and 10% to exit (SPSS default)
  - 6.3.3 Stepwise procedure with relaxed criteria
- 6.4 Artificial neural network classification
- 6.5 Model comparison
- 6. 6 Analysis of variance
  - 6.6.1 Discriminant analysis classification
  - 6.6.2 Multilayer perceptron (MLP) classification
  - 6.6.3 Model correspondence Conclusion
7 Finale
- 7.1 Summary
- 7.1.1 Literature study