Quasi-Gaussian mixture example
The aim of this Subsection is to apply our results to the Gaussian mixtures in dimension d Æ 2. The Gaussian mixture model is a typical and well-defined clustering example. However we will not deal with the clustering issue but rather with its theoretical background.
In general, a Gaussian mixture distribution ˜P is defined by its density f˜(x) Æ ˜k X iÆ1 pi2¼ p j§ij e¡12 (x¡mi )t§¡1 i (x¡mi ), where ˜k denotes the number of component of the mixture, and the pi’s denote the weights of the mixture, thus satisfy Pki Æ1 pi Æ 1. Moreover, the mi’s denote the means of the mixture, so that mi 2 R2, and the §i’s are the 2£2 variance matrices of the components. We restrict ourselves to the case where the number of components ˜k is known, and match the size k of the codebooks. To ease the calculation, we make the additional assumption that every component has the same diagonal variance matrix §i Æ ¾2I2. Note that a similar result to Proposition 2.3 can be derived for distributions with different variance matrices §i, at the cost of more computing.
Proof of Proposition 2.1
Half of the equivalences of Proposition 2.1 derives from the following interesting lemma.
Lemma 2.1. Suppose that Assumption 2.1 is satisfied, M is finite, and P has a continuous density. Then there exist two constants C¡ È 0 and CÅ È 0 such that 8c 2B(0,1)k C¡kc¡c¤(c)k2 · VarP ¡ °(c,.)¡°(c¤(c),.) ¢ · CÅkc¡c¤(c)k2.
Lemma 2.1 ensures that, if P is smooth enough, VarP(°(c,.)¡°(c¤(c),.) is equivalent to the squared Euclidean distance between c and c¤(c), which provides a direct connection between Assumption 2.3 and Assumption 2.4. Notice that we require the density of P to be smooth in order to apply Theorem 1 of [Bad77]. Improving the conditions of this Theorem could be a way to soften the smoothness requirements on P. It is important to note that Assumption 2.1 too is crucial to make the result of Proposition 2.1 valid, since it allows us to turn local differentiation arguments into global properties. The other half of the equivalences of Proposition 2.1 follows from the following result, exposed in the proof of Corollary1 in [AGG05].
Proof of Theorem 2.1
The proof strongly relies on the localization principle and its application, as exposed in [BBM08]. We start with the following definition.
Definition 2.1. Let © be a real-valued function. © is called a sub-® function if and only if © is non-decreasing and the map x 7!©(x)/x® is non-increasing. The next Theorem is an adaptation of Theorem 6.1 in [BBM08]. For the sake of clarity its proof is given in Subsection 2.5.4.
Theorem 2.3. Let F be a class of bounded measurable functions such that there exist b È 0 and ! :F ¡!RÅ satisfying (i) 8f 2F kf k1 · b, (ii) 8f 2F VarP( f ) · !( f ).
Term A : Complexity of the model
Term A in inequality (2.2) is at first sight the dominant term in the expression ©(±). The upper bound we obtain below is rather accurate, due to the finitedimensional Euclidean space structure. Indeed, we have to bound a scalar product when the vectors are contained in a ball, thus it is easy to see that the largest value of the product matches in fact the largest value of the coordinates of the gradient term. We recall that M denotes the finite set of optimal codebooks. Let x Æ (x1,…, xk) be a vector in (Rd)k. We denote by xj r the r-th coordinate of xj, and name it the ( j, r)-th coordinate of x. Moreover, denote by e j,r the vector whose ( j, r)- th coordinate is 1, and other coordinates are 0.
Proof of Proposition 2.2
We consider a distribution on Rd, distributed over small balls away from one another, and whose density inside each ball is a small cone, for continuity reasons. Denote by Vi the Voronoi cell associated with zi in (z1,…, zk). Let Q be a k-quantizer, Q¤ the expected optimal quantizer which maps Vi to zi for all i. Denote finally, for all i Æ 1,…,k, Ri(Q) Æ R Vi kx¡Q(x)k2dx the contribution of the i-th Voronoi cell to the risk of Q.
Table of contents :
1 État de l’art et résumé des travaux
1.1 Introduction à la quantification
1.1.1 Quantification et compression du signal
1.1.2 Quantification et classification non supervisée
1.2 État de l’art
1.2.1 Vitesses lentes
1.2.2 Vitesses rapides
1.3 Résumé des travaux
1.3.1 Vitesse non asymptotique optimale dans le cas régulier
1.3.2 Condition de marge et influence de la dimension
1.3.3 k-means et sélection de variables
2 Fast rates for empirical vector quantization
2.2 The quantization problem
2.3 Main results
2.4 Examples and discussion
2.4.1 A toy example
2.4.2 Quasi-Gaussian mixture example
2.5.1 Proof of Proposition 2.1
2.5.2 Proof of Lemma 2.1
2.5.3 Proof of Theorem 2.1
2.5.4 Proof of Theorem 2.3
2.5.5 Proof of Proposition 2.4
2.5.6 Proof of Theorem 2.2
2.5.7 Proof of Proposition 2.2
2.5.8 Proof of Proposition 2.3
3 Non asymptotic bounds for vector quantization
3.2 Notation and Definitions
3.3.1 Risk bound
3.3.2 Minimax lower bound
3.3.3 Quasi-Gaussian mixture example
3.4.1 Proof of Proposition 3.1
3.4.2 Proof of Proposition 3.2
3.4.3 Proof of Theorem 3.1
3.4.4 Proof of Proposition 3.3
3.4.5 Proof of Proposition 3.4
3.5 Technical results
3.5.1 Proof of Proposition 3.5
3.5.2 Proof of Proposition 3.8
3.5.3 Proof of Proposition 3.11
3.5.4 Proof of Proposition 3.10
3.5.5 Proof of Lemma 3.6
4 Variable selection for k-means quantization
4.3.1 Lasso k-means distortion and consistency
4.3.2 Weighted Lasso k-means distortion and consistency
4.4.2 Model and theoretical predictions
4.4.3 Numerical experiments
4.5.1 Proof of Proposition 4.1 and Proposition 4.2
4.5.2 Proof of Proposition 4.4
4.5.3 Proof of Proposition 4.3
4.5.4 Proof of Proposition 4.5
4.5.5 Proof of Theorem 4.1
4.5.6 Proof of Theorem 4.3
4.5.7 Proof of Theorem 4.2
4.5.8 Proof of Theorem 4.4
4.5.9 Proofs of Proposition 4.6, Proposition 4.7, Proposition 4.8 and Proposition 4.9
4.6 Technical results
4.6.1 Proof of Proposition 4.10
4.6.2 Proof of Proposition 4.11
4.6.3 Proof of Proposition 4.12
4.6.4 Proof of Lemma 4.2