Get Complete Project Material File(s) Now! »

## Enforcing constraints for intrinsic data modelling

The quest to models intrinsically adapted to their input data has thus become an ongoing pursuit in the Deep Learning community. Indeed, since the a Deep Neu-ral Network learns its own feature representation through the hierarchy of layers, clever engineering of these layers hopefully leads to more intrinsic representations of the data.

This engineering often draws inspiration from the older data processing tech-niques mentioned above, transferring them in Deep Neural Networks in a dif-ferentiable fashion. This is for instance embodied in part-based image segmenta-tion (Mordan et al. 2017) or steerable Convolutional Neural Networks (T. S. Cohen and Welling 2016). This idea of models being invariant or robust to a given class of transformations, is key to learning better representations, as illustrated in Fig-ure 1.5: a learning model should be expected to recognize that both images feature the exact same content, although deformed through given transformations.

In a similar vein, other works design layers to stabilize models through nor-malization, which can be seen as explicitly enforcing invariance to scale and translation within the network layers. The seminal effort in this direction is found in the introduction of Batch Normalization in Deep Neural Networks (Ioffe and Szegedy 2015). Further works have improved upon the idea, such as in Layer Nor-malization (J. Xu et al. 2019) . Others have studied the generalizability of such normalizations across different domains (X. Wang et al. 2019) . The recurring motivation behind inner normalizations is to reduce the network’s dependence the representation space covariate shift at each layer.

### Enforcing constraints for intrinsic data modelling

The quest to models intrinsically adapted to their input data has thus become an ongoing pursuit in the Deep Learning community. Indeed, since the a Deep Neu-ral Network learns its own feature representation through the hierarchy of layers, clever engineering of these layers hopefully leads to more intrinsic representations of the data.

This engineering often draws inspiration from the older data processing tech-niques mentioned above, transferring them in Deep Neural Networks in a dif-ferentiable fashion. This is for instance embodied in part-based image segmenta-tion (Mordan et al. 2017) or steerable Convolutional Neural Networks (T. S. Cohen and Welling 2016). This idea of models being invariant or robust to a given class of transformations, is key to learning better representations, as illustrated in Fig-ure 1.5: a learning model should be expected to recognize that both images feature the exact same content, although deformed through given transformations.

In a similar vein, other works design layers to stabilize models through nor-malization, which can be seen as explicitly enforcing invariance to scale and translation within the network layers. The seminal effort in this direction is found in the introduction of Batch Normalization in Deep Neural Networks (Ioffe aA2.2.1 Radar Core Concepts A standard radar consists in both an emitter and a receiver. The former emits a signal at a given wavelength, or more generally a given waveform (typically a linearly evolving frequency as depicted in Figure 2.1), while the latter is left to interpret all incoming signals. Let us consider a simple model where the emission frequency fe is kept constant. The base waveform is itself emitted repeatedly, at every time intervall called Pulse Repetition Intervall (PRI), the inverse of which is called Pulse Repetition Frequency (PRF). The latter can be seen as the sampling frequency. This dual system induces many problems to be dealt with such as ambiguities in distance and velocity or compromises in time versus distance resolution.

Different compromises to solve different problems lead to a wide variety of radars which exist for different purposes: active radars combine emitter and receiver while passive radars only receive estranged signals, surveillance radars span a wide area of space by being poorly resolved while tracking radars sweep only a small portion of space to gain resolution on targets, antennaes are either fixed for a longer integration time or rotating for a better coverage, and the list goes on. The type of target also contributes to the choice of radar parameters; for instance, targets with smaller Radar Cross-Section (RCS) (see Figure 2.2) will require a more powerful radar, leading to further compromises. The curious reader may dig in detailed explanations in excellent references such as V. C. Chen et al. (2006).

nd Szegedy 2015). Further works have improved upon the idea, such as in Layer Nor-malization (J. Xu et al. 2019) . Others have studied the generalizability of such normalizations across different domains (X. Wang et al. 2019) . The recurring motivation behind inner normalizations is to reduce the network’s dependence the representation space covariate shift at each layer.

#### Logistic Regression as a Baseline Classification Algorithm

Logistic regression is considered one of the most efficient (generalized) linear classification algorithms along with the Support Vector Machine (SVM) , and has known a tremendous success since its creation by Cox (1958) in many a field. It builds on logarithmically-scaled probability ratios between events: in a binary classification scenario, this model formalizes as ln(P!(y=1jx) ) = !T x, with ! being P!(y=0jx) the separating hyperplane, x 2 Rn the random variable to be classified and y 2 (0; 1) the binary class. Put simply, if x belongs to class 0, !T x is negative, and positive otherwise, which naturally corresponds to modelling the distrinution yjx as a Bernouilli random variable of parameter . Furthermore, considering P!(y = 1jx) = 1 P!(y = 0jx), we have P!(y = 1jx) = (!T x) = with:

(z) = 1 (2.3) 1 + e z being the logistic, or sigmoid function, illustrated in Figure 2.7. Two useful properties of are ( z) = 1 (z) and 0(z) = (z) ( z).

**The Perceptron as a link between Logistic Regression and Neural Networks**

Also in 1958, F. Rosenblatt independently published a novel classification algo-rithm named the Multi Layer Perceptron (MLP) (Rosenblatt 1958), which aspires to mimick the way neurons and synapses process information within the brain. From another perspective, it aims to build a seperating hyperplane in a learnt feature space, as allustrated in Figure 2.8. We show in this section how a single layer per-ceptron with entropic loss is mathematically equivalent to logistic regression, thus providing a link between « black box »-ness of Deep Neural Networks (DNNs) and well-known and studied classical learning algorithms. Given the same scenario as above, nodes (which represent neurons) are connected together with weights (synapses) which in turn fire in response to an activation function (axon gated channel). The goal is to adjust the weights such that a given input x, when passed through the network, outputs an estimation y~ which matches the ground-truth y. Again, in the special case of binary classification, y 2 (0; 1), but we can now generalize to multi-class classification. There are different approaches to gener-alizing binary classification, such as one-versus-all or one-versus-one strategies, but the most used in DNNs and thus is in our work is the one-hot encoding, in which each label y is a C-dimensional vector with y(c) = 1 and 0 everywhere else, where C is the number of classes and c the particular class of y. The weights wij can be summarized in matrix W , hence the building block of MLPs: K(x, y) = (x, y, x2 + y2) X(k+1) = f(k)(X(k)) := (X(k)W (k)) (2.8).

**Table of contents :**

résumé

remerciements

contents

list of figures

list of tables

acronyms

**1 introduction **

1.1 Context

1.2 Motivations

1.3 Contributions and outline

1.4 Related publications

**2 theoretical background **

2.1 Introduction

2.2 Radar Signal and Simulation

2.3 Euclidean Machine Learning

2.4 Information Geometry

2.5 Riemannian Machine Learning

2.6 Conclusion

**3 second-order pipeline for temporal classification **

3.1 Introduction

3.2 Learning on structured time series representations

3.3 Full pipeline for temporal classification

3.4 Experimental validation

3.5 Conclusion

**4 advances in spd neural networks **

4.1 Introduction

4.2 Data-Aware Mapping Network

4.3 Batch-Normalized SPDNet

4.4 Riemannian manifold-constrained optimization

4.5 Convolution for covariance time series

4.6 Experimental validation

4.7 Conclusion

**5 conclusion and perspectives **

**bibliography **