AIRCRAFT DYNAMICS IDENTIFICATION – Project topics materials

Get Complete Project Material File(s) Now! »

DATA PREPROCESSING

Abstract: In this chapter we give more details concerning the data used in this thesis and provide an overview of the preprocessing steps which were carried before all the analysis presented herein. We also recall briefly some basic statistical learning concepts and vocabulary which are useful throughout the manuscript.

Introduction

During all commercial flights, thousands of variables are recorded and stored in the Quick Access Recorder (QAR), whose data is commonly used for flight safety and eﬃciency purposes. Its data is easily accessible and we suppose it may be used for inferring the dynamic behaviour of an aircraft.
Although many alerts and pilot settings are also contained in this type of dataset, we limited the scope of our study to signals corresponding to physical quantities, which seem more adequate for the task. More precisely, only the raw data of the following variables were considered: the pressure altitude h, the Mach number M, the fuel consumption Cf , the aircraft mass m, the engines turbofan speed N1 (as a percentage of the maximum speed, corresponding to full throttling), the static air temperature SAT , the pitch angle θ, the heading angle ψ, the wind speed W and the wind direction ψw. The time-sampling here was of 1 observation per second. Figure 2.1 shows what these raw data signals look like.
Figure 2.1 – Raw signals of 424 diﬀerent flights extracted from the QAR of the same B737-800.
All these signals are most likely corrupted with diﬀerent errors, some of which are stochastic and others deterministic or systematic. They include for example measurement errors, calculations errors and encoding-decoding errors. As an illustration of a systematic error, we may note the jumps between -180 and 180 degrees of the heading and wind direction signals in figure 2.1. In order to correct the outlying data to some extent and prepare it for the identification methods described in chapter 3, several preprocessing procedures were performed:
• systematic errors correction;
• signal smoothing;
• construction of new variables;
• signal diﬀerentiations.
Before describing the details of such preprocessing steps, some statistical learning concepts and vocabulary used throughout the manuscript must be recalled.

Statistical learning prerequisites

Regularized regression framework

Let Y and X be random variables valued respectively on RdY and RdX . We say that a regression model exists between Y and X if there is a certain function f and a random variable ε called noise such that Y = f(X) + ε. (2.2.1)
In a regression task, a dataset {(xi, yi)}Ni=1 of observations of (X, Y ), usually called training set, is used to build an estimator fˆ of the function f. The objective usually sought is to be able to predict a response fˆ(xN+1) from a new input observation xN+1 as close as possible to the new (unknown) output yN+1. Closeness can be defined according to several possible metrics L called loss functions. The most commonly used in the case of real valued variables is the squared-loss:
L fˆ(xN+1), yN+1 , yN+1 − fˆ(xN+1) 2 . (2.2.2)
For assessing an estimator, it seems natural to want to generalize such metric to any possible new observation drawn from the joint distribution of (X, Y ). This is the idea behind the Expected Prediction Error, also called Generalization Error or Risk:
EP E(fˆ) = E hL(fˆ(X), Y )i . (2.2.3)
However, the distribution of (X, Y ) needed to evaluate this expectation is usually unknown, which means that we cannot minimize this quantity in practice. This is why many regression methods are originally defined through an optimization problem where one minimizes the empirical average of the loss across the training set over some predefined subspace F of functions from RdX to RdY :
fˆ arg min T E(f) := 1 N f xi , y . (2.2.4)
The criterion T E was originally called the empirical risk in the statistical learning litera-ture, being also known as the training error among the machine learning community.
If the functional search space F is not restrictive enough, minimizing the training error will lead to fitting to the training sample noise {εi }Ni=1 instead of filtering it, which is commonly called overfitting. This is why assumptions are usually made on f leading to particular choices of F. Another common way of fighting overfitting is by adding to the empirical risk minimization problem (2.2.4) a regularization terms R(f) promoting some hypothesized property of f: fˆ arg min 1 N (f(xi), y ) + λR(f), (2.2.5)
where λ > 0 is called the regularization parameter. An example of regularized empirical risk minimization problem are the smoothing splines presented in section 2.3. More details on the subject of empirical risk minimization can be found in [Hastie et al., 2009, Chapter 7] or Vapnik [2013].

Model selection using cross-validation

In the regularized problem (2.2.5), the regularization parameter λ determines the desired trade-oﬀ between minimizing the loss and enforcing the properties promoted by the reg-ularizer R(f). The solution of this problem depends on λ and is hence denoted by fˆλ. The task of choosing the parameter λ is commonly called model selection in the statistical learning community. The objective of this task is usually to look for a value of λ for which fˆλ would minimize the expected prediction error. As previously explained, such a quantity is intractable and needs thus to be approximated. The training error being too optimistic, one way of assessing the quality of fˆλ for some value of λ is to evaluate the averaged loss on a separate set of observations drawn from the distribution of (X, Y ), called validation set. However, the results may vary a lot depending on the way the original dataset is split into training and validation. Thus, a common solution to approximate the EPE is to use K-fold cross-validation, a technique originally introduced by Stone [1974]. It consists in randomly partitioning the observations in K < N subsets of same size m and, at each cross-validation step k = 1, . . . , K, the kth subset is used for validation, while the union of the other K − 1 subsets are used to train the estimator fˆλk, i.e. to solve problem (2.2.5). The cross-validation criteria for the parameter value λ writes then as the average of all the validation errors:
CV (λ) = 1 K 1 m yi − fˆλk(ti) 2 (2.2.6)
K k=1 m i=1 .
X X
Practitioners often compute the cross-validation scores for a finite number of values of λ and choose the one giving the minimum score. More generally speaking, cross-validation is a technique used to assess and compare the predictive power of statistical models. It is useful for choosing not only regularization parameters, but also any model parameter which is supposed to be fixed prior to training, often called hyperparameters. More details concerning cross-validation can be found in [Hastie et al., 2009, Chapter 7.10].

READ Minimal risk resolution in the class Bs2∞ and convergence rates

Smoothing splines

Among the regression techniques where a regularized empirical minimization problem is solved, we can cite the smoothing splines. Splines usually denote piecewise polynomial functions, which were originally used in applied mathematics for interpolation purposes. Smoothing splines [Schoenberg, 1964], however, allow to use this class of functions to fit a smooth curve to a set of noisy observations, filtering the noise and keeping only the “real” signal.
Let [a; b] be an interval of R. We consider here that we want to smooth N data points {(ti, yi)}Ni=1 in [a; b] ×R. We suppose that the times (t1, . . . , tN ) are known (i.e. determin-istic), while the outputs (y1, . . . , yN ) are corrupted by some random noise (ε1, . . . , εN ) of unknown distribution. We also assume that a regression function f exists such that: yi = f(ti) + εi,∀i = 1, . . . , N. (2.3.1)
Smoothing splines are defined, for a given fixed λ ≥ 0, as a solution of the following optimization problem:
N b
λ ∈ f H2 (a,b) i=1 (y i − i ))2 + λ Za (t)2dt, (2.3.2)
fˆ arg min X f(t f00
where H2(a, b) denotes the Sobolev space of square-integrable functions over (a; b) having derivatives up to order 2 also square-integrable over (a; b).
We see that (2.3.2) is a regularized empirical risk minimization problem as (2.2.5), where the loss function is the squared-loss and the search space is H2(a, b). The regular-ization parameter λ is often called smoothing parameter in this context, since it determines the trade-oﬀ between how much curvature is allowed for the solution and how close to the data we want it to be. This idea can be illustrated by examining the two extreme possible values for λ:
• when λ = 0, the set of solutions of (2.3.2) contains any function f in H2(a, b) which interpolates the data;
• when λ → ∞, no curvature is allowed and problem (2.3.2) corresponds to linear least-squares.
One popular way of setting λ is by cross-validation, as described in section 2.2.2, which has originally been suggested in the splines context by Wahba [1975].
Smoothing splines get their name from the fact that problem (2.3.2) has a unique so-lution, which is a natural cubic spline 1 with knots 2 at the sample points {ti, i = 1, . . . , N} (see proof in Eubank [1999] for example). This means that, despite the infinite-dimensional search space H2(a; b), the solution has a finite-dimensional parametrization of the form
ˆ N ˆ (2.3.3)
fλ(t) = βiBi(t),
where {Bi, i = 1, . . . , N} is a basis of natural splines (such as B-splines) and ˆ = {βi, i 1, . . . , N} are coeﬃcients to be estimated. For more details on this matter the reader is referred to [Hastie et al., 2009, Chapter 5.4] and [Andrieu, 2013, Chapter 3.2].

Preprocessing steps

The first errors that had to be corrected were systematic ones, which include oﬀsets and time-shifts. Those were identified by visualizing the trajectories and corrected one by one. Once systematic errors have been addressed, we smoothed the signals flight by flight using smoothing splines described in section 2.3, setting the smoothing parameter through cross-validation. We hoped to correct most part of the remaining problematic points. As we will show in the following chapter 3, some useful physical quantities for our problem include variables which are not directly available in QAR data. Some of this quantities describe atmospheric conditions, such as the air pressure, the density and the sound speed. They were derived from the measured signals using formulas from the International Standard Atmosphere model of the troposphere (altitudes lower than 1100 m)

Table of contents :

1 INTRODUCTION
1.1 Enjeux environnementaux et économiques
1.2 Le contexte industriel : OptiClimb
1.3 Données de vol
1.4 Contexte mathématique
1.4.1 L’optimisation de trajectoires
1.4.2 L’apprentissage statistique
1.5 Contributions et organisation du manuscrit
1.5.1 Objectifs
1.5.2 Contributions de la première partie
1.5.3 Contributions de la deuxième partie
I AIRCRAFT DYNAMICS IDENTIFICATION
2 DATA PREPROCESSING
2.1 Introduction
2.2 Statistical learning prerequisites
2.2.1 Regularized regression framework
2.2.2 Model selection using cross-validation
2.3 Smoothing splines
2.4 Preprocessing steps
3 AIRCRAFT DYNAMICS IDENTIFICATION
3.1 Introduction
3.2 Dynamics modeling
3.2.1 Possible assumptions
3.2.2 Flat Earth dynamics in vertical plane with no wind
3.2.3 Dynamics with wind
3.2.4 2D flight with non-zero bank angle
3.2.5 3D dynamics
3.2.6 Spheric rotating earth dynamics
3.2.7 Projection of the wind speed onto the ground frame of reference
3.3 Description of our identification problem
3.4 Parametric hidden models
3.4.1 Restriction to parametric models
3.4.2 Flight mechanics models
3.4.3 Remark on monomials selection
3.5 Aircraft systems identification state-of-the-art
3.5.1 The Output-Error Method
3.5.2 The Filter-Error Method
3.5.3 The Equation-Error Method
3.6 Nonlinear multi-task regression
3.6.1 Baseline regression method: Ordinary Least-Squares
3.6.2 Multi-task framework
3.6.3 Maximum Likelihood Estimators
3.6.4 Negative Log-Likelihood Minimization Discussion
3.7 Quality criteria design
3.7.1 Static criterion
3.7.2 Dynamic criterion
3.8 Numerical benchmark
3.9 Conclusion
4 STRUCTURED MULTI-TASK FEATURE SELECTION
4.1 Introduction
4.2 Feature selection
4.2.1 A feature selection categorization
4.2.2 The Lasso
4.2.3 Inconsistency in high correlation setting
4.2.4 A resampling adaptation: the Bolasso
4.2.5 An efficient implementation: LARS
4.3 Application to aircraft dynamics identification in a single-task setting
4.4 Block-sparse estimators
4.4.1 Structured feature selection
4.4.2 Linear multi-task regression framework
4.4.3 Hidden functions parametric models
4.4.4 Block-sparse Lasso
4.4.5 Bootstrap implementation
4.5 Identifiability enhancement
4.5.1 The issue with the predicted hidden functions
4.5.2 Block alternate methods
4.5.3 Additional L2 Regularization
4.6 Numerical results
4.6.1 Experiments design
4.6.2 Results
4.7 Conclusion
II SIMULATED TRAJECTORIES ASSESSMENT
5 PROBABILISTIC TRAJECTORY ASSESSMENT
5.1 Motivation
5.2 The trajectory acceptability problem
5.3 Marginal Likelihood estimators
5.3.1 Mean Marginal Likelihood
5.3.2 Empirical Version
5.3.3 Consistency of the marginal density estimations
5.3.4 Proof of the consistency of the marginal density estimation (theorem 5.3.6)
5.4 Choice of the Marginal Density Estimator
5.4.1 Parametric or nonparametric ?
5.4.2 Kernel estimator
5.4.3 Self-consistent kernel estimator
5.5 Application to the Assessment of Optimized Aircraft Trajectories
5.5.1 Experiments Motivation
5.5.2 Experiments Design
5.5.3 Alternate Approaches Based on Standard Methods
5.5.4 Algorithms Settings
5.5.5 Results and Comments
5.6 Conclusions
6 OPTIMIZATION OF ACCEPTABLE TRAJECTORIES
6.1 Introduction
6.2 Aircraft Trajectory Optimization as an Identified Optimal Control Problem
6.3 A Parametric Marginal Density Estimator
6.3.1 The Gaussian Mixture Model
6.3.2 The Expectation-Maximization Algorithm
6.3.3 More details on the derivation of the EM algorithm
6.4 Application to an Aircraft Minimal-Consumption Problem
6.4.1 Experiments Description
6.4.2 Data Description
6.4.3 Choice of the “time” variable
6.4.4 Algorithm Settings
8 Contents
6.4.5 Numerical Comparison Between MML using Gaussian Mixture and Self-Consistent Kernel Estimators
6.4.6 Penalized Optimization Results
6.5 Conclusion
7 CONCLUSION
7.1 Summary
7.2 Future research prospects
7.3 Personal perception