Get Complete Project Material File(s) Now! »

## learning from data

Machine learning is about extracting knowledge from data in order to create models that can perform tasks effectively. A typical Machine Learning application consists of the following parts:

A task and a metric for evaluation. The task comes inherently given a real problem and the metric quantifies effectiveness of a solution.

A model family that we believe is capable of solving the problem in hand. The selection of the model type depends on several factors, such as the amount of available training data (i.e. size of the given dataset), the complexity of the task and knowledge about its performance on similar problems Caruana and Niculescu-Mizil (2006).

A dataset on which the best model will be trained in order to solve the task, aiming to the best performance with respect to the evaluation metric.

A loss function that quantifies the goodness of fit. In contrast to the evaluation metric, the loss function is a differentiable and usually model-specific Bottou (2010).

An optimization algorithm to train the model. The models consist of parameters (usually referred as q), whose values reflect the performance on the loss function. Thus, an optimization strategy is required in order to select the parameter set that minimizes the loss function.

Note the difference between the first and the fourth point. In several cases these functions can be the same, for example for house price prediction, the Mean Squared Error (MSE) can be both the task objective and the loss function. In Machine learning we are creating models that can best understand a set of available data (aka a dataset) in order to perform a specific task. A dataset X consists of a usually finite number of samples xi, i = 1, . . . n, such as images, documents or users, depending on the application.

### Supervised Learning and Evaluation Metrics

Supervised learning occurs when the model is trained using input-output pairs, like in classification and regression. In this scenario, the dataset consists of two subsets, the feature set X and the label set Y, both having the same cardinality (jXj = jYj), which is equal to the number of available samples. On the other hand, unsupervised learning occurs when the model is trained using only the feature set X. The most popular unsupervised tasks is clustering, since the cluster labels are not known in advance. In the case of supervised learning, since the labels are known in advance, we can directly assess the performance of our approach using standard metrics. When the labels are continuous (Regression), yi 2 Rd, yi 2 Y, the popular performance metrics are the MSE, the Mean Absolute Error (MAE) and their weighted alternatives, where each sample comes with a different weight. When the labels correspond to distinct categories (Classification) several metrics have been proposed and are mostly related to the distribution of the classes: Accuracy, Precision/Recall, F1- Score are among the most popular ones. In order to calculate these metrics, one has to compute the confusion matrix (Table 2.1). In the context of predictive maintenance one of the most important problems is to predict equipment failures. If we consider this problem as a binary classification, our errors correspond to either false alarms (False Positives) or missed failures (False Negatives). Depending on the application, we have to decide the importance of the types of errors that we encounter and we have to deal with the balance of False Positives and False Negatives according to industrial parameters such as the expected cost. For example, in cases where alarms trigger costly maintenance processes, avoiding many False Positives is very important.

#### Random Variables in event logs

An event log contains the sequence of failures that occurred during the operation of the aircraft. The alphabet of failures is finite and each failure is recorded using its unique identifier (id). Let E = fe1, . . . , ekg be the alphabet of failures. Thus, each entry in the logbook is a tuple < ti, xi >, where ti is the timestamp of the failure and xi 2 E is the failure that occurred. For each aircraft, the logbook forms a sequence of such tuples. The alphabet of failures contains both important1 and not important ones and thus, we are interested to analyzing and predicting the former ones. Let eT be a critical failure that we aim to predict. In the history of event logs this failure may appear more than once and thus, the survival time corresponds to the time interval between two consecutive occurrences of eT (i.e. the time interval starting from the maintenance action until its next failure). We define k random variables that correspond to these intervals Tei , i 2 [1 . . . k]. Furthermore, we introduce k k 1 random variables Tei ej that correspond to the time between failure ej and the next occurrence of ei. We will use this information later in this chapter to introduce the concept of predictors. Figure 3.1 depicts these two categories of random variables with respect to the target failure eT .

**Table of contents :**

**1 introduction **

1.1 Scope of the Thesis

1.1.1 Predictive Maintenance

1.1.2 Time Series Data

1.2 Data Related to Aircraft Operation

1.2.1 Tools and Libraries

1.3 Overview of Contributions

1.4 outline of the thesis

**2 background **

2.1 Learning from Data

2.1.1 Supervised Learning and Evaluation Metrics

2.2 Probability

2.2.1 Survival Analysis

2.2.2 Survival data and Censoring

2.2.3 Gaussian Mixture Models and the EM algorithm

2.3 Regression

2.3.1 Random Forests

2.3.2 Model Evaluation

2.3.3 Hyperparameter Selection

2.4 Learning as Optimization

2.4.1 The Gradient Descent

2.4.2 Convex Quadratic Programming

**3 survival analysis for failure-log exploration **

3.1 Introduction

3.1.1 Random Variables in event logs

3.1.2 Building a Dataset for Survival Analysis

3.2 Time Interval Between Failures

3.2.1 Kaplan – Meier method

3.2.2 Cox Proportional Hazards

3.3 Studying inter-event temporal differences

3.4 Summary

**4 failure prediction in post flight reports **

4.1 Introduction

4.2 Related Work

4.3 Event Log Data & Preprocessing

4.3.1 Preprocessing

4.4 Methodology

4.4.1 Multiple Instance Learning Setup

4.4.2 Prediction

4.4.3 Method summary

4.4.4 Parameters

4.5 Experimental Setup

4.5.1 Dataset

4.5.2 Training, Validation and Test

4.5.3 Baseline Algorithm

4.5.4 Evaluation at the episode level

4.6 Results

4.6.1 Bag-level Performance

4.6.2 Episode-Level Performance

4.6.3 Decision threshold selection

4.6.4 False Positives

4.6.5 Model Interpretation

4.7 Conclusions and future work

4.7.1 Infusion and Impact

**5 logbook data preprocessing **

5.1 Related Work

5.2 Logbook Data in Aviation

5.2.1 Data Description

5.2.2 Cleaning the Logbook

5.3 The Importance of Logbook Data

5.4 Context-aware Spell Correction via Word Embeddings

5.4.1 Word Embeddings & the skip-gram Model .

5.4.2 Creating word embeddings from logbook entries

5.5 logbook cleaning using word embeddings

5.5.1 Mapping spelling errors to correct words

5.5.2 Method Summary

5.6 information extraction

5.7 Conclusions and future work

**6 component condition assessment using time series data **

6.1 Degradation

6.2 Related Work

6.3 Dataset

6.4 Modeling Degradation with GMMs

6.5 Time series decomposition

6.5.1 Quadratic Programming Formulation

6.5.2 Reformulating the Optimization Problem

6.6 Condition Assessment

6.7 Evaluation

6.7.1 Discussion

6.8 Conclusions

**7 discussion **

7.1 Summary of Contributions

7.2 Future Directions

notation

acronyms