Monokernel LMS algorithm with online dictionary

Get Complete Project Material File(s) Now! »

Linear adaptive filtering

Overview of linear adaptive filters

The aim of supervised linear adaptive filters is to automatically extract the information or identify the model from the noisy measurements of linear unknown system relying on error-correction criterion. The block diagram of a linear adaptive filter is shown in Figure 2.1. Distinct learning criteria result in diﬀerent types of adaptive filters equipped with built-in mechanism of update. Commonly used adaptive filtering algorithms are summarized in the Table 2.1. These adaptive filtering algorithms have been successfully applied to address four basic problems including systems identification, inverse modeling, prediction and interference canceling [Haykin 1991]. The factors characterizing the performance of an adaptive filter include steady-state performance, transient behavior, tracking ability, robustness, etc.
The well known least-mean-square (LMS) algorithm devised by Widrow and Hoﬀ in 1959 [Widrow 1985] is an important member of the family of stochastic gradient-based algorithms, which iterate the weight of transversal filter in the direction of the negative gradient of the squared amplitude of the instantenous error signal with respect to each weight coeﬃcient. By avoiding matrix inversion, LMS algorithm is simple and robust compared to other linear adaptive filtering algorithms. Consequently, it is considered as a benchmark when comparing the performance of diﬀerent algorithms. It inspired kernel least-mean-square (KLMS) algorithm, and plays an important role in this dissertation. The next subsections briefly review basic but important principles and provide a convergence analysis of the LMS algorithm.

Wiener filter

Let the available input signal at time instant n be denoted by un = [un, un−1, . . . , un−L+1]⊤ with L the number of adjustable parameters in the model. This input signal is fed simul-taneously into the system and model. The system output dn is referred to as the desired response or reference signal for linear adaptive filter to adjust the model parameters.

Preliminaries on kernel-based methods

In this section, we provide some insights into the fundamental concepts and properties characterizing kernel functions instead of fully rigorous presentation with necessary proofs. Most of the material presented here can also be found with more details in several papers and textbooks [Aronszajn 1950, Schölkopf 2000, Shawe-Taylor 2004, Slavakis 2013].

Definition of kernel

Following the development of SVM, the kernel-based methods attracted considerable at-tention in the machine learning community. In order to study kernel adaptive filtering, we need first to introduce necessary definition related to kernels and some useful properties.

Reproducing kernel Hilbert spaces (RKHS)

A vector space U over the real field R is an inner product space if there exists a real-valued symmetric bilinear map, i.e., inner or dot product • , • , that satisfies u, u ≥ 0. Furthermore, we will say the inner product is strict if u, u = 0 if and only if u = 0. Although an inner product space is sometimes referred to as a Hilbert space, its formal definition still requires additional properties of completeness and separability, as well as the infinite dimension.
Definition 2.3.6 (Hilbert space [Shawe-Taylor 2004]) A Hilbert space H is an inner prod-uct space with the additional properties that it is separable and complete. Completeness refers to the property that every Cauchy sequence {hn}n≥1 of elements of H converges to an element h ∈ H, where a Cauchy sequence is a sequence satisfying the property.
The reason for the importance of completeness and separability properties is that they ensure that the feature space is a complete, separable inner product space.
Definition 2.3.7 (Reproducing kernel Hilbert space [Aronszajn 1950]) Consider a linear space H of real-valued functions, f defined on a set U. Suppose that in H we can define an inner product • , • H with corresponding norm • H and that H is complete with respect to that norm, i.e., H is a Hilbert space. We call H a reproducing kernel Hilbert space, if there exists a function κ: U × U → R with the following properties:
1. For every u ∈ U, κ(•, u) belongs to H (or equivalently κ spans H, i.e. H = span{κ(•, u), u ∈ U}, where the overline denotes the closure of a set).
2. κ has the reproducing property.
Moreover, κ is a positive-definite kernel and the mapping ψ: U → H, with ψ(u) = κ(•, u), for all u ∈ U is called the feature map of H. It can be found that the RKHS uniquely determines κ. The RKHS sometimes roughly equivalent to a space of functions with an inner product, has already been shown to play an important role in kernel-based methods. Linear processing successfully performed in RKHS H by mapping the data into a higher dimensional or possibly infinite feature space, has been proven to be a very powerful tool to address nonlinear problems in original space.
The following popular theorem establishes that a RKHS may have infinite dimension, however, the solution of any regularized regression optimization problem lies in the span of n particular kernels.
Theorem 2.3.9 (Representer theorem [Schölkopf 2000]) Suppose we are given a nonempty set U, a positive-definite real-valued kernel κ on U × U, a training set {ui, di}i=1,…,n ∈ U × R, a strictly monotonically increasing real-valued function g on [0, ∞], an arbitrary cost function L: (U × R2)n → R ∪ {∞}, and a class of functions.
The significance of the Representer theorem is that it demonstrates that a whole range of learning algorithms have optimal solutions that can be expressed as an expansion in terms of kernel evaluation of the training examples.
Theorem 2.3.10 (Moore [Moore 1916]) For any reproducing kernel κ over the set U, there exists an unique Hilbert space of function on H for which κ is a reproducing kernel.
Notice that Moore theorem establishes a one-to-one correspondence between RKHS on a set and positive definite function on the set.
Proposition 2.3.11 (Kernel trick [Scholkopf 2001]) Given an algorithm which is formu-lated in terms of a positive kernel κ˜, one can construct an alternative algorithm by replacing κ˜ by another kernel κ.
The best known application of the kernel trick is in the case where κ˜ and κ are the common dot product in the input domain and the selected kernel function, respectively. Since the structure of the algorithm remains exactly the same, the nonlinearity is then obtained at no computational cost. The operation that transforms linear algorithms into their nonlinear counterparts using kernel is often called kernelization. The kernel trick is not limited to the classical case. Hence, κ˜ and κ can both be nonlinear kernels corresponding to diﬀerent RKHS, namely, diﬀerent nonlinearities.

READ Chemical and Mineralogical Composition of Surface Sediments: Variation Accounted by Lithology, Land Use and Former Industrial Activities

Preliminaries on kernel-based methods

Kernel-based methods are a powerful nonparametric modeling tool, whose essence is transforming the data in lower dimensional input space into a high or infinite dimensional feature space via a reproducing kernel such that the inner product operation can be com-puted eﬃciently through the kernel evaluations. In the machine learning literature, it has been proven that most linear learning algorithms can be applied to naturally solve non-linear problems within high-dimensional RKHS using the kernel trick. In Figure 2.3, a two-dimensional nonlinear regression problem boils down to the three-dimensional linear regression problem preprocessed by a nonlinear map ψ : [u1, u2]⊤ → [u21, u22, 2u1u2]⊤.

Examples of kernel

The kernel function replacing the inner product of two feature vectors commonly corre-sponding to two inputs, is an attractive computational shortcut to create complicated feature spaces. In practical approaches directly choosing or defining a kernel function is equivalent to implicitly defining a feature space in which the constructed algorithms are performed. Next we shall introduce two widely-used types of basic positive kernel functions.

Table of contents :

1 Introduction
1.1 Thesis context
1.2 Motivation
1.3 Contributions
1.4 Organization of the thesis
2 Linear and kernel adaptive filtering: a general overview
2.1 Introduction
2.2 Linear adaptive filtering
2.2.1 Overview of linear adaptive filters
2.2.2 Wiener filter
2.2.3 Least-mean-square algorithm
2.2.4 LMS convergence analysis
2.3 Preliminaries on kernel-based methods
2.3.1 Definition of kernel
2.3.2 Reproducing kernel Hilbert spaces (RKHS)
2.3.3 Examples of kernel
2.3.4 Kernel construction
2.4 The existing kernel adaptive filtering algorithms
2.4.1 Sparsification criteria
2.4.2 Kernel affine projection algorithm
2.4.3 Kernel normalized least-mean-square algorithm
2.4.4 Kernel recursive least-square algorithm
2.5 Conclusion
3 Monokernel LMS algorithm with online dictionary
3.1 Introduction
3.1.1 Monokernel LMS algorithms
3.2 Mean square error analysis
3.3 Transient behavior analysis
3.3.1 Mean weight behavior
3.3.2 Mean square error behavior
3.4 Steady-state behavior
3.5 Simulation results and discussion
3.5.1 Example 1
3.5.2 Example 2
3.5.3 Discussion
3.6 KLMS algorithm with forward-backward splitting
3.6.1 Forward-backward splitting method in a nutshell
3.6.2 Application to KLMS algorithm
3.6.3 Stability in the mean
3.7 Simulation results of proposed algorithm
3.8 Conclusion
4 Multikernel adaptive filtering algorithm
4.1 Introduction
4.2 Multikernel LMS algorithms
4.2.1 Single-input multikernel LMS algorithm
4.2.2 Multi-input multikernel LMS algorithm
4.2.3 Optimal solution
4.3 Convergence behavior analysis of MI-MKLMS algorithm
4.3.1 Preliminaries and assumptions
4.3.2 Mean weight error analysis
4.3.3 Mean squared error analysis
4.3.4 Steady-state behavior
4.4 Simulation results and discussion
4.4.1 Example 1
4.4.2 Example 2
4.4.3 Discussion
4.5 Conclusion
5 Complex kernel adaptive filtering algorithm
5.1 Introduction
5.2 Complex monokernel adaptive filtering algorithms
5.2.1 Complexified kernel LMS algorithm
5.2.2 Pure complex kernel LMS algorithm
5.3 Complex multikernel adaptive filtering
5.3.1 The framework
5.3.2 Augmented complex kernel least-mean-squared algorithm
5.4 Stochastic behavior analysis of ACKLMS algorithm
5.4.1 Mean weight error analysis
5.4.2 Mean-square error analysis
5.4.3 Steady-state behavior
5.5 Simulation results and discussion
5.6 Conclusion
6 Diffusion adaptation over networks with KLMS
6.1 Introduction
6.2 The kernel least-mean-square algorithm
6.3 Diffusion adaptation with KLMS algorithm
6.3.1 Functional adapt-then-Combine diffusion strategy
6.3.2 Functional Combine-then-adapt diffusion strategy
6.3.3 Implementation
6.3.4 Stability of functional diffusion strategy in the mean
6.4 Simulation results and discussion
6.4.1 Example 1
6.4.2 Example 2
6.5 Conclusion
7 Conclusions and perspectives
7.1 Thesis summary
7.2 Perspectives