Get Complete Project Material File(s) Now! »

## Fisher information and Cramer-Rao bound

In this section, we are focusing on the properties of the expected log-likelihood function (expectation taken under the true distribution): Exv [`(; x)]. This function is interesting because it is the limit of the empirical log-likelihood. Figure 1.1 – A geometric representation of the topics discussed in this section. A model is a set of distributions v(; ) parametrized by parameters 2 . It can be seen as a manifold in the distribution space. The true distribution v is a point in the distribution space, which lies on the manifold if the model holds. The model is identifiable if the mapping from to the manifold is injective. The distribution of the maximum likelihood estimator is the projection of v on the manifold. The projection is made with respect to the Kullback-Leibler divergence. The lines are not straight to illustrate that this is not an Euclidean setting. Given a finite number of samples (x1; ; xn), we do not have access to v but rather to ^v = 1 n Pn i=1 xi , its empirical counterpart, and the empirical maximum likelihood estimator ^ is once more computed by projection. We start by introducing the important notion of Fisher score. The Fisher score is (; x) = r`(; x) = 1 v(x;)rv(x; ). This is the negative gradient of the loglikelihood with respect to the parameters: it measures the steepness of the log-likelihood function at a point in the sample set. For instance, if the model holds for parameter , the log-likelihood is flat at , on average.

### Incremental EM and majorization-minimization: stochastic algorithms with descent guarantees

As mentioned previously, a drawback of SGD is that one iteration might very well increase the loss function instead of decreasing it. Therefore, it is of interest to develop stochastic algorithms (algorithms that process one sample at a time) with descent guarantees.

The expectation-maximization framework offers a nice framework to do so [Neal and Hinton, 1998]. We start by showing it on the simple Gaussian mixture model described in Example 1.7.

#### Independent Component Analysis

In this section, we give an introduction to the topic of Independent Component analysis, with help from the three fields discussed in the previous section (information geometry, optimization and manifolds). It is inspired by the two reference books on the topic: [Hyvärinen and Oja, 2000, Comon and Jutten, 2010]. Assume that we receive samples x 2 Rp. A general problem in machine learning and statistics is to extract structure from x, by finding a latent representation. ICA does exactly that, by assuming that x is a linear combination of sources s 2 Rp: x = As ;

**Table of contents :**

**1 Motivation and contribution **

1.1 Statistical principles

1.2 Optimization

1.3 A bit of Riemannian geometry

1.4 Independent Component Analysis

1.5 Contributions

1.6 Publications

**I – Faster Independent Component Analysis **

**2 Faster ICA by preconditioning with Hessian approximations **

2.1 The Picard algorithm

2.2 Extension to the orthogonal constraint

**3 Stochastic algorithms for ICA with descent guarantees **

3.1 Stochastic algorithms for ICA

**II – SMICA: Spectral Matching Independent Component Analysis for M/EEG Brain Rhythms Separation **

**4 SMICA: spectral matching ICA for M/EEG processing **

4.1 SMICA

Conclusion and Perspectives

**Bibliography **