Get Complete Project Material File(s) Now! »

## Haar and direct evaluation at dyadic rationals

The Haar wavelet (D2N,N = 1) is not suitable for a gradient computation, empirical or theoretical ; the histogram variation (projection on the space Vj spanned by the Haar wavelet), in response to a small perturbation of the demixing matrix W is in general nonexistent or imperceptible (see in particular simulations p. 101 and following). The problem also crops up at initialization by histogram at high resolution, often used in density estimation, with a string of filtering passes down to lower resolutions to follow.

To obtain an empirical gradient, we used a wavelet at least D4 and direct calculus 1 n P i φjk(Xi) with approximation of the values Xi at dyadic rationals from the algorithm explained by Nguyen et Strang (1996). Fortunately, and contrary maybe to a commonly held idea, that calculus is not a problem, if the values taken by the wavelet are preloaded in memory at a given precision. The corresponding computing time is by no means a bottleneck of the procedure in ICA, even less because we get rid of strings of filtering passes in dimension d (see part 6).

### Separating power of the contrast

We give below an excerpt of simulations appearing in part 3 ; it is an average result of 100 runs in dimension 2 with 10000 observations, a Daubechies D4, j = 3 et L = 10 (dyadic precision) for different densities ; colomn start indicates Amari distance (scaled from 0 to 100) and the wavelet contrast on entry ; colomn it is the average number of iterations in Stiefel minimization. For some densities, after pre-whitening, the position is already close to the minimum, but the contrast still detects a dependency ; the procedure is not executed if the contrast or the empirical gradient are too close to zero, and this practically always corresponds to an Amari error less than 1.

#### Exact gradient of the L2 contrast

In this document we studied the gradient of the projected contrast estimator. We could also try to determine the gradient and the hessien of the exact L2 contrast. This means we probably come across quantities that are used in the method of the matrix functional of Tsybakov and Samarov. We could then try to estimate these derived quantities with a non parametric wavelet based method.

**Non linear mixing**

The wavelet contrast is valid even if the mixing is not linear. For a random variable X = F(S) where F:U ⊂ Rd 7→ V ⊂ Rd is no more linear but invertible, the factorization measure is written in the same way , R kfF −f∗F k2 and there is an estimation of it through contrast Cj . Minimization would be carried out no longer on the Stiefel manifold but on another set depending on F.

**ACI without pre-whitening**

One could consider performing a minimization on the Stiefel manifold without any preliminary PCA transformation, that has some annoying side effects. For an invertible matrix A, there exists by QR factorisation, an orthogonal matrix Q such that QA = T is upper triangular. Q and T are moreover unique is the diagonal of T is strictly positive. For such a matrix T the inversion is immediate.

**Sensitivity of the wavelet contrast**

In this section, we compare independent and mixed D2 to D8 contrasts on a uniform whitened signal, in dimension 2 with 100000 observations, and in dimension 4 with 50000 observations. According to proposition ‘linear-choice’, for s = +∞ the best choice is j = 0, to be interpreted as the smallest of technically working j, i.e. satisfying 2j > 2N − 1, to ensure that the wavelet support is mostly contained in the observation support.

For j = 0, there is only one cell in the cube and the contrast is unable to detect any mixing effect : for Haar it is identically zero, and for the others D2N it is a constant (quasi for round-off errors) because we use circular shifting if the wavelet passes an end of the observation support. At small j such that 2 ≤ 2j ≤ 2N − 1, D2N wavelets behave more or less like the Haar wavelet, except they are more responsive to a small perturbation. We use the Amari distance as defined in Amari (1996) rescaled from 0 to 100.

**Table of contents :**

**1. Introduction **

1. Principales approches du probl`eme ACI

2. Motivation

3. R´esultats th´eoriques obtenus

Mixage lin´eaire et appartenance Besov

Contraste en projection

Estimateurs et vitesses

Propri´et´es de filtrage du gradient et du Hessien

4. R´esultats pratiques

Utilisation du contraste plug-in

Stabilit´e num´erique

Complexit´e num´erique

Usage d’estimateurs U-statistique

Haar et ´evaluation directe aux dyadiques

Pouvoir de s´eparation du contraste

5. ´ El´ements de comparaison avec d’autres m´ethodes

Kernel ICA

M´ethodes de la norme de Hilbert-Schmidt

Estimateur RADICAL

Fonctionnelle matricielle

6. Perspectives

Prolongements d’ordre pratique

Prise en compte de l’anisotropie

Contraste de type plug-in atteignant une vitesse param´etrique

Test d’ind´ependance

Adaptation

Gradient exact du contraste L2

Mixage non lin´eaire

ACI sans ACP

7. Organisation du document

Notations et conventions

**2. Introduction (english translation) **

1. Main approaches in the ICA problem

2. Motivation

3. Theoretical results

Linear mixing and Besov membership

Contrast in projection

Estimators and rates

Filtering properties of the gradient and the hessian

4. Practical results

Plug-in contrast usage

Numerical stability

Numerical complexity

U-statistic estimators usage

Haar and direct evaluation at dyadic rationals

Separating power of the contrast

5. Comparison with other methods

Kernel ICA

Hilbert-Schmidt norm method

RADICAL

Matrix functional

6. Perspectives

Practical extensions

Taking into account anisotropy

Plug-in type contrast reaching a parametric rate

Test of Independence

Adaptation

Exact gradient of the L2 contrast

Non linear mixing

ACI without pre-whitening

7. Organization of the document

Notations et conventions

8. Bibliographie de l’introduction

**3. ICA by Wavelets : the basics **

1. Notations

2. Wavelet contrast, Besov membership

Wavelet contrast

Besov membership of marginal distributions

Besov membership of the mixed density

3. Risk upper bound

Risk upper bound for ˆ Cj

Minimizing resolution in the class Bs2∞

minimizing resolution in the class Bspq

4. Computation of the estimator ˆ Cj

Sensitivity of the wavelet contrast

Contrast complexity

5. Contrast minimization

A visual example in dimension 2

6. Appendix

Bias of ˆα2j k − 2ˆαjkˆλjk + ˆλ 2j k

Decomposition of P F1(Xi1 ) . . . P Fm(Xim) rth order moment of jk

7. References for ICA by wavelets : the basics

**4. ICA and estimation of a quadratic functional **

Estimation of a quadratic functional

Wavelet ICA

1. Notations

2. Estimating the factorization measure

R (fA − f⋆A )2

ICA wavelet contrast

Wavelet contrast and quadratic functional

Estimators under consideration

Notational remark

Sample split

Bias variance trade-off

Minimal risk resolution in the class Bs2∞ and convergence rates

3. Risk upper bounds in estimating the wavelet contrast

Risk upper bound, d + 1 independent samples — fA, f⋆1

A , . . . , f⋆d A

Risk upper bound, 2 independent samples — fA, f⋆A

Full sample ˆ C2

j risk upper bound

Risk upper bound, full sample — fA

4. Appendix 1 – Propositions

5. Appendix 2 – Lemmas

Property set

Many sets matching indices

Two sets matching indices [Corollary and complement]

Product of r kernels of degree m

Meyer

Path of non matching dimension numbers

Daubechies wavelet concentration property

rth order moment of jk

6. References for ICA and estimation of a quadratic functional

**5. Towards thresholding **

1. Estimating R

(fA − f⋆A )2 with term-by-term thresholding

ICA wavelet contrast

Wavelet contrast in place of the quadratic functional R

(fA − f⋆A )2

Hard-thresholded estimator

Block threshold

2. Risk upper bound

Risk of a thresholded procedure

3. Practical issues

Computation of the estimator ˆ Cj

Choice of the threshold in practice

4. Appendix 1 – Propositions

rth moment of ˆ βjk and ˆμjk

Product of r kernels of degree m

5. Appendix 2 – Combinatorial lemmas

Property set

Many sets matching indices

Path of non matching dimension numbers

6. Appendix 3 – Thresholding related lemmas and others

wavelet contrast in βjk

Large deviation for term by term thresholding

Number of big coefficients

rth order moment of jk

Daubechies wavelet concentration property

Bernstein inequality

7. References for Towards thresholding

**6. Stiefel manifold, optimization and implementation issues **

1. Direct evaluation of ϕjk(x) at floating point numbers

2. Relocation by affine or linear transform

3. Wavelet contrast differential and implementation issues

ˆ Cj differential in Y

ˆ Cj differential in W

ˆ Cj second derivatives

ˆ Cj hessian in Y

ˆ Cj hessian in W

4. Filter aware formulations for the gradient and the hessian

filter aware gradient formulation

filter aware hessian formulation

5. Wavelet contrast implementation

Flat representation routines

DWT in dimension 1

DWT in dimension d

Thresholding in dimension d

Computation of ϕ values at dyadic rationals

Contrast computation

Haar projection (Daubechies D2N,N = 1)

Projection on Vj spanned by a Daubechies D2N, general case

6. Optimization on the Stiefel manifold