A generative model for power-law distributed clustering instance

Get Complete Project Material File(s) Now! »

Relation between label propagation and least squares

This algorithm can be interpreted as a classification algorithm rather than a clustering algorithm, for few reasons. Each cluster must be supervised, as a consequence this algorithm cannot discover new clusters. This is a hard constrained algorithm, that is all constraints must be satisfied. It implies that the algorithm cannot decide that part of the constraints were wrong. It cannot decide either that despite supervision a cluster does not exist.
Also, as first contribution on label constraints, we show that this algorithm has a close connection with least squares in the full spectral embedding of the space describing the commute-time distance between nodes of the graph, that is: the pseudo-inverse of the Laplacian. Consider the following least square problem: x ? = min k x k 2 (3.1.7) x such that Alx yl = 0 where A is a matrix whose rows describe the nodes; in our case, the matrix A cor-responds to a space describing the commute-time distance between nodes of the graph. That is, if V is the matrix whose columns are the eigenvectors of the Laplacian matrix L of the graph and is the diagonal matrix whose entries are the corresponding eigenval-ues, then V ( +)1=2 is a matrix whose rows are the coordinates of the nodes in the space describing their commute-time distance. Likewise, in our case, we have A = V 1=2 This b V b b L+ 11 where V = and contains the eigenvectors and eigenvalues of L = >. b b does not change the fact that A is also a space describing the commute-time distance between nodes of the graph, as we only increased the eigenvalue associated with the eigenvector 1, so that it is not equal to zero. When computing the commute-time distance, this dimension is irrelevant as all the nodes have the same value along this dimension. The matrix A is divided in two parts Al and Au for labelled nodes and unlabeled nodes. Likewise, the vector y is divided in two parts yl and yu corresponding to known labels and unknown labels that we must discover. We are going to show that unknown labels computed with Equation (3.1.4) are very close to labels computed by.

Accelerating spectral clustering with partial supervision

In this section, we discuss two clustering algorithms from Mavroedis [54, 55], in which partial supervision is introduced by adding soft label constraints. The goal of the author is to accelerate the computation of the required eigenvectors required by spectral clus-tering. However, his work can be used to improve the quality of clustering output, as label constraints act as a hint for the clustering algorithm. The method has the eﬀect of increasing the spectral gap, that is the diﬀerence between the third and the second eigen-value of the graph Laplacian. Then it uses the power method to compute the required eigenvectors. The computation time of the power method depends on the spectral gap. By increasing the spectral gap, the computation time for the power method to converge is decreased. There are other methods to compute eigenvectors which are more eﬃcient than the power method. However, these algorithms also depend on the spectral gap in the same way. Thus, in the following, we will first quickly review the power method to understand where the acceleration comes from, then we will consider the two algorithms proposed by Mavroedis in subsections 3.2.2 and 3.2.3.

The power method

The power method [21] is an iterative algorithm for computing the largest eigenvector of a matrix. The computational cost of the power method depends on the cost of multiplying a matrix by a vector and the number of steps required for convergence is dependent on the ratio between the first two largest eigenvalues of that matrix. Consider a matrix M whose ordered eigenvalues are j 1j < < j nj and corre-sponding eigenvectors are v1; : : : ; vn. That is: Mvi = ivi for all i = 1; : : : ; n.

Normalized One-against-All Supervised Spectral Clus-tering

The above method has some disadvantages. It will not preserve the degree matrix of the graph, consequently algorithms that require that the degrees of the graph are preserved cannot be used. This includes the power method to compute the semi-supervised Fiedler vector. Moreover, supervision will only occur for nodes that are in the same cluster. Therefore, for more than 2 clusters, edges between nodes that are assigned to diﬀerent clusters will not receive any supervision. We think this is a limitation of this method. In the current section, our goal is to present a method that fixes these two problems. We present noa-ssc (for normalized one-against-all supervised spectral clustering) which extends [54] to k clusters without the drawbacks of [55]. We will first present [54] very quickly. See Section 3.2.2 for more details. Then we will explain what we cannot do and what we can do to achieve our goal. In [54], label constraints are introduced by applying a rank-1 update to the simi-c = larity matrix of the graph. That is, the supervised similarity matrix is equal to W W + D1=2v1v>1D1=2, see Equation (3.2.4). Recall that if ‘1 and ‘2 correspond to sets of supervised nodes assigned to the cluster 1 and the cluster 2, and ‘ = ‘1 [ ‘2, the supervision vector v1 is equal to.

Table of contents :

1 Introduction
2 Preliminaries: Unsupervised Graph Clustering
2.1 Similarity graphs
2.1.1 Graph notation
2.1.2 Obtaining a graph
2.2 The Laplacian matrix of the graph
2.3 Graph cuts
2.3.1 Approximation of Ncut for 2 clusters
2.3.2 Approximation of Ncut for more than 2 clusters
2.4 The spectral embedding
2.4.1 Normalized cut from random walks
2.4.2 The commute-time distance
2.5 Evaluation of clustering
3 Label constraints
3.1 Label Propagation
3.1.1 Relation between label propagation and least squares
3.2 Accelerating spectral clustering with partial supervision
3.2.1 The power method
3.2.2 2-cluster case
3.2.3 k-cluster case
3.3 Normalized One-against-All Supervised Spectral Clustering
3.4 Experiments with noa-ssc
3.5 Conclusion
4 Pairwise constraints
4.1 First steps into pairwise constraints
4.1.1 Spectral Learning
4.1.2 noa-ssc: Extension to pairwise constraints
4.1.3 First experiments on pairwise constraints
4.2 Tuning the similarity matrix
4.2.1 Semi-supervised Graph Clustering: A Kernel Approach
4.2.2 Constrained Spectral Clustering through Affinity Propagation
4.3 Tuning the spectral embedding
4.3.1 Constrained Clustering via Spectral Regularization
4.3.2 Fast Gaussian Pairwise Constrained Clustering
4.4 Other approaches
4.4.1 Constrained 1-Spectral Clustering
4.4.2 Scalable Constrained Clustering: A Generalized Spectral Method
4.5 Experiments on pairwise constraints
4.5.1 UCI and Network Data sets
4.5.2 Noun Phrase Coreference Resolution
5 Partition-level constraints
5.1 Power-law distributions
5.1.1 Some properties of power-laws
5.1.2 Identifying and measuring power-law distributions
5.1.3 Pitman-Yor Process
5.2 Power-Law Graph Cuts
5.2.1 A generative model for power-law distributed clustering instance
5.2.2 Biasing weighted kernel k-means to fit the model
5.2.3 Limits of this approach
5.3 Clustering Power-Law Distributed Data Sets in One Pass
5.3.1 Experiments with one-pass power-law clustering
5.3.2 Towards a structured prediction model
5.3.3 Optimization of single non-convex contraint QCQP
5.3.4 Conclusion on one-pass power-law clustering
6 Conclusion