Minimal risk resolution in the class Bs2∞ and convergence rates

Get Complete Project Material File(s) Now! »

Haar and direct evaluation at dyadic rationals

The Haar wavelet (D2N,N = 1) is not suitable for a gradient computation, empirical or theoretical ; the histogram variation (projection on the space Vj spanned by the Haar wavelet), in response to a small perturbation of the demixing matrix W is in general nonexistent or imperceptible (see in particular simulations p. 101 and following). The problem also crops up at initialization by histogram at high resolution, often used in density estimation, with a string of filtering passes down to lower resolutions to follow.
To obtain an empirical gradient, we used a wavelet at least D4 and direct calculus 1 n P i φjk(Xi) with approximation of the values Xi at dyadic rationals from the algorithm explained by Nguyen et Strang (1996). Fortunately, and contrary maybe to a commonly held idea, that calculus is not a problem, if the values taken by the wavelet are preloaded in memory at a given precision. The corresponding computing time is by no means a bottleneck of the procedure in ICA, even less because we get rid of strings of filtering passes in dimension d (see part 6).

Separating power of the contrast

We give below an excerpt of simulations appearing in part 3 ; it is an average result of 100 runs in dimension 2 with 10000 observations, a Daubechies D4, j = 3 et L = 10 (dyadic precision) for different densities ; colomn start indicates Amari distance (scaled from 0 to 100) and the wavelet contrast on entry ; colomn it is the average number of iterations in Stiefel minimization. For some densities, after pre-whitening, the position is already close to the minimum, but the contrast still detects a dependency ; the procedure is not executed if the contrast or the empirical gradient are too close to zero, and this practically always corresponds to an Amari error less than 1.

Exact gradient of the L2 contrast

In this document we studied the gradient of the projected contrast estimator. We could also try to determine the gradient and the hessien of the exact L2 contrast. This means we probably come across quantities that are used in the method of the matrix functional of Tsybakov and Samarov. We could then try to estimate these derived quantities with a non parametric wavelet based method.

Non linear mixing

The wavelet contrast is valid even if the mixing is not linear. For a random variable X = F(S) where F:U ⊂ Rd 7→ V ⊂ Rd is no more linear but invertible, the factorization measure is written in the same way , R kfF −f∗F k2 and there is an estimation of it through contrast Cj . Minimization would be carried out no longer on the Stiefel manifold but on another set depending on F.

ACI without pre-whitening

One could consider performing a minimization on the Stiefel manifold without any preliminary PCA transformation, that has some annoying side effects. For an invertible matrix A, there exists by QR factorisation, an orthogonal matrix Q such that QA = T is upper triangular. Q and T are moreover unique is the diagonal of T is strictly positive. For such a matrix T the inversion is immediate.

Sensitivity of the wavelet contrast

In this section, we compare independent and mixed D2 to D8 contrasts on a uniform whitened signal, in dimension 2 with 100000 observations, and in dimension 4 with 50000 observations. According to proposition ‘linear-choice’, for s = +∞ the best choice is j = 0, to be interpreted as the smallest of technically working j, i.e. satisfying 2j > 2N − 1, to ensure that the wavelet support is mostly contained in the observation support.
For j = 0, there is only one cell in the cube and the contrast is unable to detect any mixing effect : for Haar it is identically zero, and for the others D2N it is a constant (quasi for round-off errors) because we use circular shifting if the wavelet passes an end of the observation support. At small j such that 2 ≤ 2j ≤ 2N − 1, D2N wavelets behave more or less like the Haar wavelet, except they are more responsive to a small perturbation. We use the Amari distance as defined in Amari (1996) rescaled from 0 to 100.

Table of contents :

1. Introduction
1. Principales approches du probl`eme ACI
2. Motivation
3. R´esultats th´eoriques obtenus
Mixage lin´eaire et appartenance Besov
Contraste en projection
Estimateurs et vitesses
Propri´et´es de filtrage du gradient et du Hessien
4. R´esultats pratiques
Utilisation du contraste plug-in
Stabilit´e num´erique
Complexit´e num´erique
Usage d’estimateurs U-statistique
Haar et ´evaluation directe aux dyadiques
Pouvoir de s´eparation du contraste
5. ´ El´ements de comparaison avec d’autres m´ethodes
Kernel ICA
M´ethodes de la norme de Hilbert-Schmidt
Estimateur RADICAL
Fonctionnelle matricielle
6. Perspectives
Prolongements d’ordre pratique
Prise en compte de l’anisotropie
Contraste de type plug-in atteignant une vitesse param´etrique
Test d’ind´ependance
Adaptation
Gradient exact du contraste L2
Mixage non lin´eaire
ACI sans ACP
7. Organisation du document
Notations et conventions
2. Introduction (english translation)
1. Main approaches in the ICA problem
2. Motivation
3. Theoretical results
Linear mixing and Besov membership
Contrast in projection
Estimators and rates
Filtering properties of the gradient and the hessian
4. Practical results
Plug-in contrast usage
Numerical stability
Numerical complexity
U-statistic estimators usage
Haar and direct evaluation at dyadic rationals
Separating power of the contrast
5. Comparison with other methods
Kernel ICA
Hilbert-Schmidt norm method
RADICAL
Matrix functional
6. Perspectives
Practical extensions
Taking into account anisotropy
Plug-in type contrast reaching a parametric rate
Test of Independence
Adaptation
Exact gradient of the L2 contrast
Non linear mixing
ACI without pre-whitening
7. Organization of the document
Notations et conventions
8. Bibliographie de l’introduction
3. ICA by Wavelets : the basics
1. Notations
2. Wavelet contrast, Besov membership
Wavelet contrast
Besov membership of marginal distributions
Besov membership of the mixed density
3. Risk upper bound
Risk upper bound for ˆ Cj
Minimizing resolution in the class Bs2∞
minimizing resolution in the class Bspq
4. Computation of the estimator ˆ Cj
Sensitivity of the wavelet contrast
Contrast complexity
5. Contrast minimization
A visual example in dimension 2
6. Appendix
Bias of ˆα2j k − 2ˆαjkˆλjk + ˆλ 2j k
Decomposition of P F1(Xi1 ) . . . P Fm(Xim) rth order moment of jk
7. References for ICA by wavelets : the basics
4. ICA and estimation of a quadratic functional
Estimation of a quadratic functional
Wavelet ICA
1. Notations
2. Estimating the factorization measure
R (fA − f⋆A )2
ICA wavelet contrast
Wavelet contrast and quadratic functional
Estimators under consideration
Notational remark
Sample split
Bias variance trade-off
Minimal risk resolution in the class Bs2∞ and convergence rates
3. Risk upper bounds in estimating the wavelet contrast
Risk upper bound, d + 1 independent samples — fA, f⋆1
A , . . . , f⋆d A
Risk upper bound, 2 independent samples — fA, f⋆A
Full sample ˆ C2
j risk upper bound
Risk upper bound, full sample — fA
4. Appendix 1 – Propositions
5. Appendix 2 – Lemmas
Property set
Many sets matching indices
Two sets matching indices [Corollary and complement]
Product of r kernels of degree m
Meyer
Path of non matching dimension numbers
Daubechies wavelet concentration property
rth order moment of jk
6. References for ICA and estimation of a quadratic functional
5. Towards thresholding
1. Estimating R
(fA − f⋆A )2 with term-by-term thresholding
ICA wavelet contrast
Wavelet contrast in place of the quadratic functional R
(fA − f⋆A )2
Hard-thresholded estimator
Block threshold
2. Risk upper bound
Risk of a thresholded procedure
3. Practical issues
Computation of the estimator ˆ Cj
Choice of the threshold in practice
4. Appendix 1 – Propositions
rth moment of ˆ βjk and ˆμjk
Product of r kernels of degree m
5. Appendix 2 – Combinatorial lemmas
Property set
Many sets matching indices
Path of non matching dimension numbers
6. Appendix 3 – Thresholding related lemmas and others
wavelet contrast in βjk
Large deviation for term by term thresholding
Number of big coefficients
rth order moment of jk
Daubechies wavelet concentration property
Bernstein inequality
7. References for Towards thresholding
6. Stiefel manifold, optimization and implementation issues
1. Direct evaluation of ϕjk(x) at floating point numbers
2. Relocation by affine or linear transform
3. Wavelet contrast differential and implementation issues
ˆ Cj differential in Y
ˆ Cj differential in W
ˆ Cj second derivatives
ˆ Cj hessian in Y
ˆ Cj hessian in W
4. Filter aware formulations for the gradient and the hessian
filter aware gradient formulation
filter aware hessian formulation
5. Wavelet contrast implementation
Flat representation routines
DWT in dimension 1
DWT in dimension d
Thresholding in dimension d
Computation of ϕ values at dyadic rationals
Contrast computation
Haar projection (Daubechies D2N,N = 1)
Projection on Vj spanned by a Daubechies D2N, general case
6. Optimization on the Stiefel manifold