A short primer on industrial systems and cybersecurity.

Get Complete Project Material File(s) Now! »

Robustness and hypothesis testing

When applying hypothesis testing methods to a real problem, we need an accurate model of reality. In the case of traditional testing methods that we presented in Section 1.1, this means having perfect knowledge of all the potential probability distributions that our observation may follow. In practice however, assuming that we can have such knowledge of all involved distributions is questionable. Indeed, “mathematical models are often significant simplifications/idealizations of complex physical problems” [16], and mismatches between these models and reality may have adverse effects on the performance of the test used if they are not accounted for. Therefore, it is important to consider approaches able to take these potential mismatches into account, so that we have robust methods that can guarantee a certain level of performance in real scenarios. Huber [30, 21] presents such an approach, which consists in determining whether the probability distribution that generated a given observation is close enough or not to given models. We will first explain this approach, then we will consider the RDT problem and see how this approach is well-suited for robust hypothesis testing.

Huber’s approach to robust hypothesis testing

Following Levy [21], to explain Huber’s approach to robust hypothesis testing, we will consider a testing problem as an example: consider that, given an observation of a random variable with unknown probability distribution , we want to test ℋ0 ∶ = 0 against ℋ1 ∶ = 1, where both 0 and 1 are known. This a typical problem where both hypotheses are simple. In such a problem, we would derive the Neyman-Pearson test NP based on the likelihood ratio of 0 and 1, and this test is known to be optimal with a given level in this case.
Now imagine that both probability distributions 0 and 1 are approximations of the real probability distributions that may be encountered in practice, obtained, for example, after observation of a sufficiently large number of samples from each real distribution. This means that 0 and 1 are not the actual possible probability distributions of , but only approximations of the real distributions. Therefore, the Neyman-Pearson test of level computed using 0 and 1 is not actually optimal in this scenario, and it also may not actually have level in practice.
However, if the distributions 0 and 1 are decent approximations of the real distributions, we can hope that the actual distributions we will encounter are close enough to these. If this is indeed the case, and we want to find a test with level , one solution consists in testing ℋ0 ∶ ≈ 0 against ℋ1 ∶ ≈ 1. Of course, we have to define the meaning of ≈ 0 and ≈ 1: what does it mean for a probability distribution to be “close enough” to another one? Defining this notion of closeness means defining a neighborhood of each distribution 0 and 1, i.e. two sets ℱ0 and ℱ1, that contain 0 and 1 respectively. These two sets represent all of the possible real probability distributions that we may encounter in practice, that are close enough to 0 and 1.
How do we choose these neighborhoods? Or in other words, how do we define that two distributions are close enough? There are several answers to this question, of which we give a couple examples. In the following, we consider that we want to define a neighborhood ℱ for some probability distribution 0 that admits a probability density function 0. To define ℱ, we can consider for example the following approaches: Contamination model. Let ∈ (0, 1). A contamination model consists in assuming that the observation is drawn either with probability 1 − from 0, or with probability from some other unknown probability distribution. The set ℱ is here defined by: ℱ = {(1 − ) 0 + , ∈ } (1.20) Distance-based approaches. Several distances can be defined between probability distributions. For two probability distributions 1 and 2, we can for example use the following distances:
• The Kolmogorov distance is defined as the maximum distance between the cumulative distribution functions 1 and 2 of 1 and 2 respectively: ( 1, 2) = sup | 1( ) − 2( )| (1.21).

RDT and robust hypothesis testing

We will now explain how the RDT approach is an appropriate choice for robust hypothesis testing. The robustness of the RDT approach comes from the problem statement, which incorporates the notion of model mismatch. As a reminder, the RDT problem consists in deciding whether a realization of some signal Θ is close enough to a model 0, using a noisy observation of Θ. In a sense, we can interpret this problem as wanting to know whether 0 is a reasonable model of the signal Θ( ) or not. This is done by determining whether we have (Θ( ) − 0 ) ≤ or (Θ( ) − 0) > , which is more robust than wanting to test Θ( ) = 0 against Θ( ) ≠ 0. Indeed, in many cases, it does not make much sense to test whether Θ( ) = 0, as this is often impossible in practice, and not necessarily relevant. It may not make much sense to attempt to detect every deviation of Θ from 0 , including very minor ones. The use of the tolerance therefore gives this approach a certain robustness to these minor changes, and also allows the user to define the amplitude of the deviations that should be detected.
In addition, the signal model used in the RDT problem makes very few assumptions on the observation . The model used assumes that the observation consists of the signal Θ, whose distribution is completely unknown, in presence of some additive independent Gaussian noise . Because of this, there is no possible model mismatch regarding the signal Θ, since it does not require any knowledge. As such, the only possible source of mismatches has to do with the noise, whose distribution needs to be known perfectly. However, we will see in Chapter 3 that the parameters of this Gaussian noise can be estimated, and that we still retain an asymptotic optimality by doing so.
Overall, these different aspects make the RDT approach a good candidate to build robust detection methods: the nature of the problem itself deals with robustness, and the few assumptions made on the signal model allow it to be used regardless of the distribution of the signal of interest.

Generalization of the RDT approach

After explaining the notion of invariance, we will now see its application to the RDT problem. The main goal of this section is to offer a generalization of the RDT problem where the distribution of the noise is not necessarily Gaussian, but instead has an invariant distribution with respect to some group , allowing us to offer an optimal test in more situations.
The work presented in this section was conducted in great part with the assistance of Sabrina Bourmani, who initiated the work on this generalization in her thesis [32]. We will consider here a group ( , ) of linear transformations defined on ℝ . The linearity of the functions ∈ is important to keep in mind, and is adapted to the problem we consider, which is linear in the sense that we consider an observation that is the sum of some independent signal Θ and noise . We also consider an ℝ-valued maximal invariant ∶ ℝ → ℝ of . For every ∈ ℝ, Υ is an orbit: Υ={ ∈ℝ, ()= } (2.8) We call the set of all these orbits: = {Υ , ∈ (ℝ )}. We start by stating the Generalized RDT problem, adapted from the RDT problem presented in Chapter 1, and explain the differences that exist between these two problems. We will then redefine the essential elements of the RDT problem (size, power, optimality, etc.) for the GRDT case. After that, we will present several results which are completely independent from the distribution of the noise . Finally we will focus on the case where the maximal invariant is the euclidean norm ‖ ⋅ ‖2, in which case we can derive a test ( ) similarly to the RDT case, and present results regarding its optimality.

Generalization when the maximal invariant is the euclidean norm

Until now, the only assumption we made on the noise is that the group is composed of linear transformations . However, we were unable so far to find a -MCCP test using only these very limited assumptions. Therefore, as a starting point, we decided to focus on using the euclidean norm ‖ ⋅ ‖2 as the maximal invariant, since we know that we can find a -MCCP test in the Gaussian case. This generalization is not as broad as we originally intended, but would still let us consider other spherically invariant probability distributions, such as generalized Gaussian distributions.
The following lemma provides an important relation between the probability density function of a spherically invariant variable and of its norm. This is an important lemma, as it is what allows us to continue our reasoning using the euclidean norm. For other maximal invariants, we do not necessarily have a similar relation, which is likely why we have trouble working in the general case.

Table of contents :

Résumé long
1 Introduction
2 Random Distortion Testing
3 Invariance et RDT Généralisé
4 RDT Asymptotique
5 Apprentissage de comportements et détection d’anomalies sur signaux réels
6 Conclusion
Introduction
Context.
A short primer on industrial systems and cybersecurity.
Industrial Control Systems
Detection schemes: misuse detection vs. anomaly detection
Data
The SWaT testbed
Objectives and contributions.
Thesis structure
Preliminary: mathematical notions and notations
1 Hypothesis Testing and Random Distortion Testing
1.1 Traditional hypothesis testing approaches.
1.1.1 Non-Bayesian binary classification.
1.1.2 Optimality
1.2 The RDT approach.
1.2.1 Problem statement
1.2.2 Optimality: 𝛾-MCCP tests
1.2.3 Thresholding tests and optimality
1.3 Robustness and hypothesis testing
1.3.1 Huber’s approach to robust hypothesis testing
1.3.2 RDT and robust hypothesis testing
2 Invariance and Generalized Random Distortion Testing
2.1 Invariance in group theory
2.1.1 Group theory
2.1.2 Invariance
2.1.3 Orbits and maximal invariant.
2.1.4 Invariance applied to probability distributions. 13 /
2.2 Generalization of the RDT approach
2.2.1 Problem statement
2.2.2 Redefining notions for the GRDT problem.
2.2.3 Preliminary results
2.2.4 Generalization when the maximal invariant is the euclidean norm.
2.3 Conclusion and perspectives
3 Asymptotic Random Distortion Testing
3.1 Preliminary results
3.2 Asymptotic RDT
3.2.1 Problem statement
3.2.2 Asymptotic size and power.
3.3 Simulation experiments.
3.3.1 Level
3.3.2 Application to a detection problem
3.3.3 Recovering a test with level 𝛾 when estimating 𝜎0
3.4 Conclusion
4 Learning behaviors and detecting anomalies on real signals
4.1 Detecting discontinuities on continuous signals
4.2 Segmenting and learning phases, and detecting anomalies.
4.2.1 Change-detection in time series
4.2.2 An RDT-based change-in-mean detection method
4.2.3 Application to real signals
4.2.4 Perspectives: towards learning a model and detecting anomalies
4.3 Conclusion
Conclusion and perspectives
Appendices
A Deterministic Distortion Testing
B Finding suitable families of TP-2 distributions for the GRDT problem
B.1 Notations and preliminary results
B.2 Calculations
C Uniform continuity of the Generalized Marcum function 𝑄𝑑/2
Acronyms
List of Publications
Bibliography