Links with Domain Adaptation and Transfer Learning

Get Complete Project Material File(s) Now! »

Theoretical challenges

Classical machine learning systems not only suffer from time and memory constraints, their theoretical foundation is also shaken. Most supervised learning systems rely on the assumption that the training examples are independently and identically distributed. This condition however cannot be ensured in real data streams, as there is no control over the source generating the examples. In fact, in many streaming systems, subsequent examples are expected to be related and their order of appearance is not arbitrary. For instance, the images observed by a mobile agent during a navigation process are space related. In case of a system predicting flight delays, it is expected that many (timely) close flights will be delayed, for instance, due to bad weather.
Classical learning systems also consider that the world is stationary. Hence, once a classical system learns a concept, the concept remains unchanged. Generally, if the training examples are generated over a long period of time, as is the case in data streams, the underlying concept is expected to evolve, creating what is known as a concept change. For instance, user preferences evolve depending on fashion trends. Market demands evolve depending on economy conditions, and so on. As a result, if the environment changes, the predictions of the static concept will be wrong and as time goes by, more and more mistakes will be made.

Concept change

In non-stationary environments, the underlying target concept is expected to evolve with time, depending on some hidden context. Kubat [61] gives the example of a system that learns to control the load redistribution in computer clusters where the overloaded units send part of their load to underloaded units. The rules describing the overload depend on many variables, as the CPU and memory requirements, the frequency of disk operations, and others. However, the only observed variables to the system are the lengths of the CPUs and disk queues. Thus, the workload structure is the hidden context controlling the generation of the visible variables. The workload structure is expected to evolve with time and the same context might also reappear with time. Consider also the example of a marketing system that learns customers preferences by observing their transactions on a website. Having a knowledge of the costumer’s preferences enables suggesting specific products and promotions. Customer’s buying profiles might evolve with time, depending on fashion trends, economical conditions, and the like. In a rising economy for instance, nouveaux-riches customers will buy goods or luxuries they were unable to buy before. In this case, fashion trends, economy conditions and other latent variables controlling the observed transactions are hidden to the learning system. The system can only see the transactions, with no additional knowledge of the hidden context behind the market evolution.
The changes in the hidden context induce more or less radical changes in the target concept, creating what is known as a concept change. In the general case, the learning system has no a priori knowledge about the time at which a concept changes or starts changing, nor about the severity and speed of change. It is also possible for the same context to reappear, either in a cyclic manner (seasonal variations) or in an irregular manner (inflation, market mood). In case of recurrence, the learning system should take advantage of previous experience in the learning of the current concept.

Types of concept change

If we consider that the concept is represented by the distribution of the training examples p(x, y) = p(y|x) ∗ p(x), a concept change happens in three cases:
• The conditional distribution p(y|x) changes.
• The unconditional distribution p(x) changes.
• The change involves both distributions: p(y|x) and p(x).
The change in the conditional distribution p(y|x) is generally referred to as a concept drift [22, 101]. An example of concept drift happens in spam filtering system where the concept classifies emails into spam or non-spam depending on their content. The concept is likely to become less accurate with time since spammers try constantly to mislead the filtering systems by changing the statistical properties of the target value p(y|x).
The change can also come from the unconditional distribution p(x). Different terms are used to refer to this type of change: a pseudo concept drift [81], a virtual concept drift [88], a sample selection bias [32] or a covariate shift [8]. This type of change doesn’t necessarily reflect a non-stationary environment. The world can be stationary but the change in the unconditional distribution happens because the order of the received examples depends on the part of the world currently explored. In a navigation task for instance, the distribution of the images perceived by a mobile agent depends on the (timely) local part of the world currently visited. Thus, the agent encounters a change in the unconditional distribution of the images during navigation, even though the world is stable.

READ Photonic approaches to detect single molecule fluorescence at physiological concentration

The stability-plasticity dilemma

It is not possible from a practical point of view to store all the examples received in a data stream. Only a summary of examples can be kept. And under the assumption of possible concept changes, it is important to ensure that the summary reflects the underlying target concept. In other words, the summary shouldn’t include examples belonging to an old concept, as obsolete examples can be misleading and even harmful to the learning system. A simple solution is to trust the recently received examples.
Thus, at each time step, a window containing the n most recent examples can be used to learn the current concept. However, the size of the window, n, remains to be defined. In fact, if the concept is stable, a large window of training examples allows the system to learn more precisely the current target concept. However, if the concept is changing, the window should be small, excluding outdated data. This is known as the stabilityplasticity dilemma [31, 64]. While a plastic learning system with a small window size adapts rapidly to changes, a stable learner with a large window size is more reliable in periods of stability.

Adaptation and anticipation

Predicting in an environment with possible evolving states can be handled with two noncontradictory and potentially simultaneous approaches: adaptation and anticipation.
Adaptation follows new trends, with no insight to the future. An adaptative approach would incorporate new training examples into the learning memory when the concept is stable, and would reset the learning memory when the concept changes. Anticipation’s concern, on the other side, is to understand and characterize the evolution of the environment in order to predict upcoming trends. While adaptative approaches adapt passively to changes, anticipation acts pro-actively, trying to be aware of what might happen in the near future, preparing prediction strategies in advance. Studying changes in the environment requires keeping memory of the past. From a practical point of view, it is not possible to retain all the received training examples in the memory. Hence, a compact representation should be considered. For instance, in the work we will present, we suggest an anticipative approach that keeps the history of all encountered concepts during the data streaming. Given the sequence of encountered concepts Cseq = 􀀀 C1,C2, . . . ,Ck , a pure anticipative approch may operate as follows:
• The upcoming concept ˜ Ck+1 is predicted by analyzing the sequence of previously encountered concepts. The predicted concept ˜ Ck+1 is then used to predict the labels of data stream examples when Ck changes.
• The upcoming concept ˜ Ck+1 is assumed to be a recurring concept i.e. ˜ Ck+1 ⊂ Cseq. Thus, when Ck changes, Cseq is scanned and the concept that fits the recent examples most is used for prediction.

Table of contents :

Abstract
Acknowledgements
List of Figures
List of Tables
1 Introduction
2 The Problem
2.1 Classical Supervised Machine Learning
2.1.1 Scenario
2.1.2 Performance criterion
2.1.3 Learning systems as optimization tools
2.1.4 The bias-variance tradeoff
2.1.5 Overfitting
2.1.6 Practical evaluation measures
2.2 Data Streaming
2.2.1 Practical challenges
2.2.2 Theoretical challenges
2.2.3 Concept change
2.2.4 Types of concept change
2.2.5 Properties of concept change
2.2.6 The stability-plasticity dilemma
2.2.7 Adaptation and anticipation
2.3 Online Machine Learning
2.3.1 Scenario
2.3.2 Practical evaluation measures
2.3.3 The theory of online learning
2.3.4 Online learning in practice
2.3.5 Online learning datasets
2.4 Summary
3 State of Art
3.1 Adapting to the Change
3.1.1 Explicit detection
3.1.2 Implicit adaptation
3.2 Online Classifiers
3.2.1 IB3 (1991)
3.2.2 FLORA (1996)
3.2.3 RePro (2005)
3.2.4 PreDet (2008)
3.3 Online Ensembles of Classifiers
3.3.1 DWM (2003)
3.3.2 CDC (2003)
3.3.3 KBS-stream (2005)
3.3.4 DIC (2008)
3.3.5 Adwin Bagging (2009)
3.3.6 ASHT-Bagging (2009)
3.3.7 CCP (2010)
3.3.8 Leveraging Bagging (2010)
3.3.9 DDD (2012)
3.4 Summary
4 Adaptation to Concept Changes
4.1 Motivation
4.2 Framework
4.2.1 Experts
4.2.2 Prediction
4.2.3 Weighting functions
4.2.4 Deletion strategies
4.3 DACC
4.3.1 The committee of predictors
4.3.2 The committee evolution
4.3.3 The weighting functions
4.3.4 The final prediction
4.3.5 Processing training examples
4.3.6 Time & memory constraints
4.3.7 Computational complexity
4.3.8 Implicit diversity levels
4.3.9 The stability-plasticity dilemma
4.3.10 Effect of parameters
4.3.11 Choice of parameters
4.4 DACC: Comparison with Other Systems
4.4.1 DACC vs CDC
4.4.2 DACC vs DDD, EDDM, DWM
4.4.3 DACC vs others systems
4.5 Contribution
5 Anticipating Concept Changes
5.1 Concept Predictability
5.1.1 DACCv1
5.1.2 DACCv2
5.1.3 DACCv3
5.2 Concept Reccurence
5.2.1 DACCv4
5.3 ADACC
5.3.1 Computational complexity
5.3.2 Empirical results (1)
5.3.3 Empirical results (2)
5.4 Contribution
6 Conclusion and Perspectives
6.1 DACC
6.1.1 Methodology
6.1.2 Properties
6.1.3 Strengths, weaknesses and perspectives
6.2 ADACC
6.2.1 Methodology
6.2.2 Properties
6.2.3 Strengths, weaknesses and perspectives
6.3 Links with the Theory of Online Learning
6.4 Links with Domain Adaptation and Transfer Learning