Term A : Complexity of the model
Term A in inequality (2.2) is at first sight the dominant term in the expression ©(±). The upper bound we obtain below is rather accurate, due to the finitedimensional Euclidean space structure. Indeed, we have to bound a scalar product when the vectors are contained in a ball, thus it is easy to see that the largest value of the product matches in fact the largest value of the coordinates of the gradient term. We recall that M denotes the finite set of optimal codebooks. Let x Æ (x1,…, xk) be a vector in (Rd)k. We denote by xj r the r-th coordinate of xj, and name it the ( j, r)-th coordinate of x. Moreover, denote by e j,r the vector whose ( j, r)- th coordinate is 1, and other coordinates are 0. Taking into account that every c¤ in M satisfies the centroid condition, that is P¢(c¤,.) Æ 0, we may write sup c¤2M,kc¡c¤k· p ± ¯¯ c¡c¤,(Pn ¡P)(¡¢(c¤)) ®.
Minimax lower bound
Theorem 1 in [BLL98] ensures that the minimax convergence rate over the Mbounded distributions of any empirically designed codebook can be bounded from below by (1/ p n). A question of interest is to know whether this lower bound can be refined when considering only distributions satisfying some fast convergence condition.
A partial answer is given by Corollary 2 in [Ant05], where it is proved that the minimax rate over distributions with uniformly bounded continuous densities, unique optimal codebook (up to relabeling), and such that the second derivative matrices at the optimal codebook H(c¤) are positive definite, is still (1/ p n). According to [Ant05], a natural question is to know whether a uniform upper bound of the type o(1/ p n) may be derived, with the additional requirement that the minimum eigenvalue of the second derivative matrices H(c¤) is uniformly bounded from below.
This Subsection is devoted to obtaining a minimax lower bound on the excess risk over a set of distributions with continuous densities, unique optimal codebook, and satisfying the margin condition defined in Definition 3.1, in which some parameters, such as pmin are fixed or uniformly lower-bounded. Since, in this case, the minimum eigenvalues of H(c¤) are larger than pmin/2, such a minimax lower bound provides an answer to the question mentioned above.
Quasi-Gaussian mixture example
The aim of this Subsection is to illustrate the results offered in Section 3.3 with Gaussian mixtures in dimension d Æ 2. The Gaussian mixture model is a typical and well-defined clustering example. However we will not deal with the clustering issue but rather with its theoretical background. In general, a Gaussian mixture distribution ˜P is defined by its density f˜(x) Æ ˜k X iÆ1 µi 2¼ p j§ij e¡12 (x¡mi )t§¡1 i (x¡mi ).
where ˜k denotes the number of components of the mixture, and the µi’s denote the weights of the mixture, which satisfy Pki Æ1 µi Æ 1. Moreover, the mi’s denote the means of the mixture, so that mi 2 R2, and the §i’s are the 2£2 variance matrices of the components.
We restrict ourselves to the case where the number of components ˜k is known, and match the size k of the codebooks. To ease the calculation, we make the additional assumption that every component has the same diagonal variance matrix §i Æ ¾2I2. Note that a similar result to Proposition 3.4 can be derived for distributions with different variance matrices §i, at the cost of more computing. Since the support of a Gaussian random variable is not bounded, we define the “quasi-Gaussian » mixture model as follows, truncating each Gaussian component. Let the density f of the distribution P be defined by f (x) Æ Xk iÆ1 µi 2¼¾2Ni e¡ kx¡mik2 2¾2 1B(0,M).
Weighted Lasso k-means distortion and consistency
In this Section the penalty function is ˆ IWL(c) Æ Pd pÆ1¾ˆ p q c(p)2 1 Å…Å c(p)2 k . The fact that the weights ¾ˆ p depends on the sample makes the proofs more intricate than in the previous case. To address this issue, this penalty function is connected to a deterministic penalty function, namely IWL(c) Æ Xd pÆ1 ¾p q c(p)2 1 Å…Å c(p)2 k .
Denote by T the quantity maxpÆ1,…,d Mp ¾p . The following Proposition relates ˆ IWL to IWL. Proposition 4.4. Suppose that 2n È T2p log(d). For every y Ç log(d) µ 2n T2 p log(d) ¡1 ¶2 , we have, with probability larger than 1¡ e¡y, for all c in Ck, p 1¡®(y)IWL(c) · ˆ IWL(c) · p (4.7) 1Å®(y)IWL(c), where ®(y) Æ T2 p log(d) p 2n ³ 1Å q y log(d) ´ .
The proof of Proposition 4.4 is given in Section 4.5. Proposition 4.4 ensures that, provided that enough sample points are at disposal to correctly estimates the coordinate-wise variances, the data-driven penalty function ˆ IWL(c) should be close to the deterministic penalty function IWL(c). Equipped with this Proposition, some results can be derived for the k-means procedure with penalty IWL(c) which can be related to results for theWeighted Lasso k-means procedure we propose. This is the idea motivating the following results.
Table of contents :
1 État de l’art et résumé des travaux
1.1 Introduction à la quantification
1.1.1 Quantification et compression du signal
1.1.2 Quantification et classification non supervisée
1.2 État de l’art
1.2.1 Vitesses lentes
1.2.2 Vitesses rapides
1.3 Résumé des travaux
1.3.1 Vitesse non asymptotique optimale dans le cas régulier
1.3.2 Condition de marge et influence de la dimension
1.3.3 k-means et sélection de variables
2 Fast rates for empirical vector quantization
2.2 The quantization problem
2.3 Main results
2.4 Examples and discussion
2.4.1 A toy example
2.4.2 Quasi-Gaussian mixture example
2.5.1 Proof of Proposition 2.1
2.5.2 Proof of Lemma 2.1
2.5.3 Proof of Theorem 2.1
2.5.4 Proof of Theorem 2.3
2.5.5 Proof of Proposition 2.4
2.5.6 Proof of Theorem 2.2
2.5.7 Proof of Proposition 2.2
2.5.8 Proof of Proposition 2.3
3 Non asymptotic bounds for vector quantization
3.2 Notation and Definitions
3.3.1 Risk bound
3.3.2 Minimax lower bound
3.3.3 Quasi-Gaussian mixture example
3.4.1 Proof of Proposition 3.1
3.4.2 Proof of Proposition 3.2
3.4.3 Proof of Theorem 3.1
3.4.4 Proof of Proposition 3.3
3.4.5 Proof of Proposition 3.4
3.5 Technical results
3.5.1 Proof of Proposition 3.5
3.5.2 Proof of Proposition 3.8
3.5.3 Proof of Proposition 3.11
3.5.4 Proof of Proposition 3.10
3.5.5 Proof of Lemma 3.6
4 Variable selection for k-means quantization
4.3.1 Lasso k-means distortion and consistency
4.3.2 Weighted Lasso k-means distortion and consistency
4.4.2 Model and theoretical predictions
4.4.3 Numerical experiments
4.5.1 Proof of Proposition 4.1 and Proposition 4.2
4.5.2 Proof of Proposition 4.4
4.5.3 Proof of Proposition 4.3
4.5.4 Proof of Proposition 4.5
4.5.5 Proof of Theorem 4.1
4.5.6 Proof of Theorem 4.3
4.5.7 Proof of Theorem 4.2
4.5.8 Proof of Theorem 4.4
4.5.9 Proofs of Proposition 4.6, Proposition 4.7, Proposition 4.8 and
4.6 Technical results
4.6.1 Proof of Proposition 4.10
4.6.2 Proof of Proposition 4.11
4.6.3 Proof of Proposition 4.12
4.6.4 Proof of Lemma 4.2