Information-Theoretic Analysis of the Neural Network Memory

Get Complete Project Material File(s) Now! »

Sparse Coding

The notion of sparsity appears in diﬀerent forms in the biological neural systems. The very first stage of sparsity in the brain is at the reception of numerous input stimuli. The neural systems are selective when they are exposed to the input stimuli. They do not capture every stimulus but a few at a time [Mil56]. This can be observed in the daily life where we always capture those information that we intend to. For example we are able to concentrate on a specific person’s voice in a crowd. This can also be unconscious where much of the sensory information received by our skin is automatically neglected by the brain but those that are significant such as pain. The perceived information are passed to SM and STM afterwards (see section 1.1) to be noticed by the conscious mind. They are passed to a deeper memory level, LTM, depending on whether they need to be remembered later. This structure acts as a biological sieve that filters out the unnecessary information.
The second sparsity feature is observed in the organization of neurons and con-nections. The neural system is organized in diﬀerent sections and the neocortex is not an exception, in its both vertical and horizontal dimensions. It has six horizon-tal layers containing various types of the neurons and other vital cells. The one with greater importance in this report is columnar division of the neocortex [Mou97]. A column from top-down perspective is a cylinder containing a group of neurons with the same receptive field. In a column, there are microcolumns1 that are groups of about 80 ∼ 100 neurons [Mou97]. They are believed to be the basic units of the neocortex [Mou97]. The microcolumns are connected through horizontally organized neural layers which constructs a grid [Mou97]. A single neuron has a simple structure that may not be able to store and process a remarkable amount of information. Thus, a stand alone neuron is not suﬃciently informative, which implies that the complicated operations in the brain require a developed organi-zation of neurons. This has led the brain to evolve into a structure in which the stored information in the neocortex appears as activated microcolumns that are connected horizontally [F02]. These connections are locally dense and recurrent whereas they are sparse at the global level [San03].

Messages

Information in digital devices is quantized and packed into groups of bits and bytes, etc. In this report, information is represented with vectors in which the elements are drawn from an alphabet. The vectors can be the product of sampling real-world analog information or they may be obtained by the post processing of an information source. It is assumed that these vectors are sparse, i.e., a few of their elements bear information and the rest is negligible or blank. Sparse signals and vectors are of great importance in information science and they have been studied extensively in mathematical frameworks such as Compressed Sensing [Don06, CT05, CW08], Information Theory [ASZ10, AT10, WWR10, Wai09], Machine Learning [WCF07], etc. The goal of the mentioned studies is to exploit sparsity to improve the stor-age, sampling, energy eﬃciency, etc. of information systems. Likewise, sparsity is a strong characteristic to be used in neural network memories. An application of neural network memories and more specifically associative memories is image recognition. Assume a set of images such as peoples’ faces. There is an image (in the image set) to be recognized when a part of it is covered or missing. This requires a post-processing framework that extracts unique image specifications which are stored (in the neural network memory) and used as the image signature. A modified Speeded Up Robust Features (SURF) [BETvG08] or Scale-Invariant Feature Transform (SIFT) [Low99] may be applied on images to get the images’ specifications. These transforms return a set of pixel coordinates as a set of interest points, also known as keypoints. For each image with χ pixels, a vector (of size χ) is constructed in which the keypoints take a nonzero value and the rest are zero. It is assumed that these vectors uniquely represent each image in the image set are sparse and they are so-called specification vectors. Specification vectors are stored in the proposed network. When part of an image is missing, it appears as missing nonzero elements in its corresponding specification vector. The defected image is then recognized by feeding the network with its partially known specification vector and using the properties of the associative memory especially redundant coding.
Another application is wireless communications. A technique so-called Cogni-tive Radio [Hay05, IGM99] is the center of attention among the researchers of the field in recent years. Communication spectrum is a limited and therefore expensive resource. This gets worse by the advent of new wireless hand-held and personal communication devices. In the past, organisations such as International Telecom-munication Union (ITU) allocated communication bandwidths to many standards, especially mobile networks. Today, there is almost no more free-space left in the frequency spectrum. This has led the communications bodies to allocate shared bandwidths for various applications such as personal communications. There are various strategies to use shared radio spectrum bandwidths. One of the cognitive radio techniques is the opportunistic scheme through which radio devices access randomly the spectrum range. The issue that rises with this opportunistic scheme is that each radio device is responsible for surveying the frequency spectrum to find an empty slot (in time, frequency and space) for communication. Therefore, a radio device sees the frequency spectrum as a vector in which the occupied fre-quency slots appear as nonzero elements and the rest is zero. This vector is named here as spectrum state vector. In hand-held and battery-driven devices, energy is a valuable asset to be conserved for vital operations. Assume the spectrum changes among a limited number of patterns for a period of time. In this case the device may fully sense the spectrum and store the spectrum state vector in the proposed network (as an associative memory). Afterwards, the radio device may increase its energy eﬃciency and communication rate by sensing a limited part of the whole spectrum and recover the rest from the neural associative memory. The basic item which the network perceives as an input stimulus is a message. Following the above discussion, a message with a few information bearing elements is so-called sparse. The sparse messages that are dealt with in this report can be either naturally sparse or result of a transform such as Fourier and Wavelet. In this report, the sparse messages are treated by the network regardless of their real-world source and they may belong to one of the mentioned or other possible applications.
A message is a vector with χ elements. An element of the message is socalled sub-message that is drawn from an alphabet denoted by A or it may be blank. The blank elements are the none information-bearing elements because they are null or negligible. The number of non-blank (significant) sub-messages is so-called message order. The sub-message alphabet has finite cardinality |A| = ℓ. Throughout this report, sub-messages are assumed to be uniformly distributed over the alphabet A. Although the alphabet size can be arbitrary depending on the application, it is often a power of two. This choice makes the network compatible with the digital systems where information is quantized in bits. Therefore, a message may be considered as a sequence of bits to be stored in the network. This requires that the message is divided into c groups of κ = log2 ℓ bits (see section 2.3).

Storage

Neural networks are connectionist networks composed of nodes and their corre-sponding connections. This structure can be modelled as a graph in which these nodes are vertices and their interconnections are edges of the graph [Hay99,GB11]. Throughout this document, a network with binary connections is considered; in other words, either there is a connection between two nodes or there is no con-nection. In the following, the network and its components that store messages according to the sparse coding approach are defined.
Definition 1 (Neural Network). According to the previously given theory, a neural network is an undirected multipartite2 graph (F1, … , Fχ, W) where F1, … , Fχ are sets of vertices and W is the set of edges. All sets of vertices have equal cardinality, ∀j ∈ {1, … , χ}, |Fj | = ℓ.
Definition 2 (Cluster). A set of vertices Fi ∈ {F1, … , Fχ} is a cluster. A cluster is indicated by an index and the set of cluster indices is C = {1, … , χ}.
Definition 3 (Fanal). A vertex of the graph which is an element of ∪χi=1Fi is a fanal. A fanal in a cluster is indicated by an index. The set of fanal indices is F = {1, … , ℓ}.
A neural network is uniquely identified with (W, χ, ℓ) where ℓ is the number of fanals in each cluster and χ is the number of clusters in the network that is so-called afterwards cluster space size. A pair (i, j) ∈ C × F uniquely specifies a vertex in the graph.
The defined graph along with the storage and retrieval mechanisms is a network that is capable of interacting (with another system or the outside world through an interface) as a system. The very first step toward building such a system is the storage process that the storage eﬃciency depends on it. It also significantly influences the retrieval process, which will be elaborated in the next sections.
The storage is the process of mapping a set of messages into the network through constructing binary connections among the fanals. To exploit the network eﬃciently, a well designed storage procedure is required in which various goals are pursued. The first purpose of the storage is to map a message in the network by constructing or removing edges of the graph (connections) in a manner that the message could be retrievable afterwards. The neural network memory retrieves the stored message by providing its associated content (e.g. partial message or blurred/noisy message) to the network. This implies that the storage is a one-to-one function. This mapping may be considered one-to-one where only a single message is subject to storage and recovery in the network.
The eﬃciency of the storage is also an important issue along with the correct storage and retrieval of messages. In other words, the storage should exploit the intrinsic property of the network and the messages (e.g. sparsity) to perform a kind of compression (lossy or lossless). This is to avoid the storage of messages as a pile of information that would imply an ineﬃcient use of resources. In this network, such a compression scheme is implemented by sharing fanals and connections among stored messages. This compression occurs where at least two distinct messages have a few corresponding equal sub-messages. For example consider two messages drawn from alphabet A = {1, 2, … , 128} and dashes as blank sub-messages, m1 = [−, 21, −, −, 2, −, 12, −, 5, −, −, −, 115, −] and m2 = [−, 45, −, −, 2, −, 12, −, −, 65, −, −, 74, −].

Retrieval

A biological neural system such as the human brain is exposed to the input stim-uli during the learning process. These stimuli create or change the connections’ strengths, which is the act of learning. Thereafter, the brain is able to retrieve these stimuli in various scenarios. An example of this is the capability of storing and reading the written words. For example, the brain stores a word (spell of the word) which afterwards it is able to remember completely even if it is given a part of it (a few but not all letters of the word) [PB04]. The proposed neural network memory, inspired from the human brain, is expected to perform similar functional-ity. It is though limited to the storage of a message set that can be considered as words of a dictionary (similar to human experience of learning and remembering words). The messages (in the message set) consist of sub-messages taken from an alphabet (similar to a language alphabet like English alphabet).
This network is expected to be able to retrieve a stored message from its partially known sub-messages which is the fundamental property of associative memories. The network also addresses the problem where a hypothetical message is to be verified by the network whether it is stored. This is is also known as Set Implementation. The brain is also able to recognize morphological or dictation errors. Such situations can take place for example when the letters of a written word are permuted [GW04]. The network is capable of retrieving a message where it is fed by its permuted sub-messages. The classical retrieval algorithm was first introduced in [GB11]. The retrieval algorithm is based on a variation of Winner-take-all (see [GB11, Maa00]) and message-passing. The employed message-passing is similar to a kind of message-passing used in error control coding. In the following, the retrieval algorithm given in [GB11] is elaborated. Although the principles of the algorithm remain unchanged, it is modified in diﬀerent ways to improve the performance in terms of retrieval error probability.

Table of contents :

Contents
List of Figures
List of Tables
1 Introduction
1.1 Memory
1.2 Compression and Robustness
1.2.1 Neural Coding
1.3 Artificial Neural Networks
1.4 Thesis structure
2 Sparse Neural Network Memories
2.1 Sparse Coding
2.2 Messages
2.3 Storage
2.4 Retrieval
2.4.1 Winner-take-all
2.4.2 The Algorithm
2.5 Performance
2.5.1 Associative Memory
Blind Recovery
Variable Order Messages
2.5.2 Blurred Messages
2.5.3 Set Implementation
2.6 Optimal Network
2.6.1 Optimal Constant Message Order
2.6.2 Optimal Density
2.6.3 Optimal Variable Message Order
3 Information-Theoretic Analysis of the Neural Network Memory
3.1 Analysis of the Message Source
3.2 Channel Coding
3.3 Source-Channel Coding
3.4 Communication Channel Model
3.4.1 Puncturing Block Channel Model
3.4.2 Neural Network Memory Channel Model
3.4.3 Recovery Constraints: Differential Information Rate .
3.4.4 Recovery Constraints: Fano’s Inequality
3.4.5 Discussion
4 Information-Theoretic Constraints on Sparse Pattern Recovery
4.1 Recovery
4.1.1 Restricted Isometry Property (RIP)
4.1.2 Null-Space Property (NSP)
4.1.3 Verification of the Algebraic Conditions
4.2 Compressed Sensing and Neural Networks
4.3 Information-Theoretic Conditions
4.3.1 Bounds and Recovery Conditions
Strictly Sparse Signals
Approximately Sparse Signals
4.3.2 Proof
Theorem 4
Corollary 1
Theorem 6
Corollary 2
5 Discussion and Conclusion
Bibliography