Novel ternary representation with weight sharing

Get Complete Project Material File(s) Now! »

Usefulness of convolutions in deep learning

In deep learning, an important argument in favor of CNNs is that con-volutional layers are equivariant to translations. Intuitively, that means that a detail of an object in an image should produce the same features independently of its position in the image.
The converse result, as a consequence of Theorem 41, is never mentioned in deep learning literature. However it is also a strong one. For example, let’s consider a linear function that is equivariant to translations. Thanks to the converse result, we know that this function is a convolution op-erator parameterized by a weight vector w, fw : : w. If the domain is compactly supported, as in the case of images, we can break down the information of w in a finite number nq of kernels wq with small com-pact supports of same size (for instance of size 2 2), such that we have P fw = q2f1;2;:::;nqg fwq . The convolution operators fwq are all in the search space of 2 2 convolutional layers. In other words, every translational equivariant linear function can have its information parameterized by these layers. So that means that the reduction of parameters from an MLP to a CNN is done without loss of expressivity (provided the objective function is known to bear this property). Besides, it also helps the train-ing to search in a much more confined space. For example, on CIFAR-10 (see description in Section 1.3.3), CNNs reportedly attain up to 2.31% er-ror on classification (Yamada et al., 2018), while MLPs plateaued at 21.38% (Lin et al., 2015). Intuitively, the reason for this success is simplification by symmetry: the supposed translational equivariance of the objective func-tion is a symmetry that is exploited by the convolution layer to simplify its input.

Construction on the vertex set

As Theorem 41 is a complete characterization of convolutions, it can be used to define them i.e. convolution operators can be constructed as the set of linear transformations that are equivariant to translations. However, in the general case where G is not a grid graph, translations are not de-fined, so that construction needs to be generalized beyond translational equivariances.
In mathematics, convolutions are more generally defined for functions defined over a group structure. The classical convolution that is used in deep learning is just a narrow case where the domain group is an Euclidean space. Therefore, constructing a convolution on graphs should start from the more general definition of convolution on groups rather than convolution on Euclidean domains. Our construction is motivated by the following questions:
Does the equivariance property holds ? Does the characterization from Theorem 41 still holds ? Is it possible to extend the construction on non-group domains, or at least on mixed domains ? (i.e. one signal is defined over a set, and the other is defined over its transformations). Can a group domain draw an underlying graph structure ? Is the group convolution naturally defined on this class of graphs ? Can we characterize graphs accepting our construction ? In this section, we first aim at transferring the group convolution onto the vertex set. Then, in Section 2.3, we will see the implications of considering the edge set in the process.

READ Causally-Consistent Object Database for Client-Side Applications

Inclusion of the edge set in the construction

The constructions from the previous section involve the vertex set V and a group acting on it. Therefore, it looks natural to try to relate the edge set and .
There are two point of views. Either describes an underlying graph structure G = hV; Ei, either G can be used to define a relevant subgroup to which the produced convolutive operators will be equivariant. Both ap-proaches will help characterize classes of graphs that can support natural definitions of convolutions.

Edge-constrained convolutions

Definition 62. Edge-constrained transformation An edge-constrained (EC) transformation on a graph G = hV; Ei is a trans-formation f : V 7!V such that E 8u; v 2 V; f(u) = v ) u v.

Table of contents :

Introduction
1 Presentation of the field
1.1 Tensors
1.1.1 Definition
1.1.2 Manipulation
1.1.3 Binary operations
1.2 Deep learning
1.2.1 Neural networks
1.2.2 Interpretation
1.2.3 Training
1.2.4 Some historical advances
1.2.5 Common layers
1.3 Deep learning on graphs
1.3.1 Graph and signals
1.3.2 Learning tasks
1.3.3 Datasets
1.3.4 Spectral methods
1.3.5 Vertex-domain methods
2 Convolution of graph signals
2.1 Analysis of the classical convolution
2.1.1 Properties of the convolution
2.1.2 Characterization on grid graphs
2.1.3 Usefulness of convolutions in deep learning
2.2 Construction on the vertex set
2.2.1 Preliminaries
2.2.2 Steered construction from groups
2.2.3 Construction under group actions
2.2.4 Mixed domain formulation
2.3 Inclusion of the edge set in the construction
2.3.1 Edge-constrained convolutions
2.3.2 On properties of the corresponding operators
2.3.3 Locality-preserving convolutions
2.3.4 Checkpoint summary
2.4 From groups to groupoids
2.4.1 Motivation
2.4.2 Definition of notions related to groupoids
2.4.3 Construction of partial convolutions
2.4.4 Construction of path convolutions
2.5 Conclusion
3 Deep learning on graph domains
3.1 Layer representations
3.1.1 Neural interpretation of tensor spaces
3.1.2 Propagational interpretation
3.1.3 Graph representation of the input space
3.1.4 Novel ternary representation with weight sharing
3.2 Study of the ternary representation
3.2.1 Genericity
3.2.2 Sparse priors for the classification of signals
3.2.3 Efficient implementation under sparse priors
3.2.4 Influence of symmetries
3.2.5 Experiments with general graphs
3.3 Learning the weight sharing scheme
3.3.1 Discussion
3.3.2 Experimental settings
3.3.3 Experiments with grid graphs
3.3.4 Experiments with covariance graphs
3.3.5 Improved convolutions on shallow architectures
3.3.6 Benchmarks on citation networks
3.4 Inferring the weight sharing scheme
3.4.1 Methodology
3.4.2 Translations
3.4.3 Finding proxy-translations
3.4.4 Subsampling
3.4.5 Data augmentation
3.4.6 Experiments
3.5 Conclusion
Conclusion
Bibliography