Computation in dynamic networks 

Get Complete Project Material File(s) Now! »

Euclidian geometries for update rules

Here, we introduce the elements that will be needed in the main proofs of our theorems Theorems 2.5.1 and 2.6.1; their articulation towards convex update rules mostly originates in [29], to which we refer the reader for their detailed treatment.
Perron-Frobenius theory of ergodic matrices
Ergodic matrices are at the center of the study of the convergence of convex update rules.
Definition 15 (Ergodic matrices). A stochastic matrix A is said to be ergodic when the geometric sequence A: is convergent and there exists a positive stochastic vector 0 such that lim A: = 10˙ (2.5) :!1
The vector 0 involved in eq. (2.5) is then unique, and called the Perron vector of the matrix A, which we denote by 0 „A”. ❦
The spectral theory of non-negative matrices is usually referred to as Perron-Frobenius theory, after two important early investigators; see for example [93] for a detailed reference. Let us recall some of its central points that will be essential to our upcoming argument: a stochastic matrix A is ergodic if and only if A: ¡ 0 for some power :, which is in particular the case when its associated graph GA is reflexive and strongly connected. An ergodic matrix has 1 for dominant eigenvalue – that is, 1 is a simple root of its characteristic polynomial jA„- ”, and the rest of its eigenvalues are contained in the open unit disk.
Here, we will principally consider ergodic matrices that arise from update rules: for a given round C, the matrix A, defined from eq. (2.1)
by letting 8 9 = 08 9 „C” if 8 and 9 are neighbors and 8 9 = 0 otherwise, is ergodic whenever the affine weights 08 9 „C” are all positive and the communication graph in round C is reflexive and strongly connected. The effect of the update x „C 1” 7!x „C” induced by an ergodic matrix is best understood under a geometric lens, which will be foundational to the complexity proof of the MaxWeight algorithm in Theorem 2.5.1.
Weighted geometries
For any positive stochastic vector 0 2 R=, we define the 0-weighted Euclidian geometry with the inner product h i 0 and the norm k k 0 : X p hu vi0 B c8D8E8 kvk0 B hv vi0 (2.6)
Here, we will consider weighted geometries defined by the Perron vectors of ergodic matrices: the action of an ergodic update matrix A over a vector of estimates x „C” induces a reduction in the variance defined by the geometry weighted by 0 = 0 „A”: V0 „v” B kv h v 1i0 1k20 = kvk20 h v 1i20 (2.7)
The weighted variances have in common with the diameter diam that they are semi-norms over the space R= whose zeroes are exactly the consensus vectors – that is, they both represent ways of measuring the “distance to consensus” of the system. As the convergence time is expressed in terms of the diameter, but the geometric approach we take here is based on weighted variances, we will rely on the following result given in [29] to connect the two.
Lemma 2.3.1. Let 0 2 R= be a positive stochastic vector. For any vector v 2 R=, 2 V0 „v” „diam v”2 4 V0 „v” (2.8) min8 c8
Reversibility Given any matrix A 2 R= =, we easily verify that its 1 adjoint for the 0- weighted inner product h i 0 is given by diag 0 1
A˙ diag 0, where diag 0 and diag 0 denote the diagonal matrices with respectively c81 and c8 as their 8-th diagonal entry. We will say that a matrix is reversible if it is self-adjoint for any weighted inner product. By definition, a reversible matrix is subject to the spectral theorem: it is diagonalizable in an orthonormal (for the right weighted inner product) basis, and its eigenvalues are real numbers. For a stochastic reversible matrix A, we have in particular Sp A » 1 1….
Let us denote the 0-weighted adjoint of a matrix A by Ay0 . Letting D B diag 0, we then have, for any 0-self-adjoint matrix A
A˙0 = „D0 AD1 0”0
= D A1
= 0
from which we deduce that if the matrix A is ergodic, then necessarily we have 0 = 0 „A”. As a consequence, the natural geometry with which to study an ergodic matrix A is that given by the inner product h i 0 „A” : if we denote its 0 „A”-adjoint by Ay, then the matrix AyA is reversible and ergodic, with the same Perron vector as A. The min-max variational characterization of the eigenvalues of the matrix AyA known as the Rayleigh-Ritz theorem then gives us:
sup _ = sup hAyAv vi0 „A” (2.9)
_ 2Sp AyA v 2R= nf 0 g kvk20 „A”
_<1 hv 1i0 „A” =0
from which we deduce the following bound over the reduction in variance due to the multiplication by an ergodic matrix.
Lemma 2.3.2. For an ergodic matrix A, 8v : V0 „A” „Av” 6 „1 WAyA” V0 „A” „v” (2.10)
If the matrix A is furthermore reversible, then we have in fact 8v : V0 „A” „Av” 6 „1 WA”2 V0 „A” „v” (2.11)
Proof. We let 0 B 0 „A”. Fix a vector v < 0, and let us show that V0 Av 6 „1 WAyA” V0 v. For any U 2 R, we have V0 v ‚ U1 = V0 v and V0 A„v ‚ U1” = V0 Av ‚ U1. Without loss of generality, we can then assume that hv 1i0 = 0, and it suffices to show that kAvk20 6 „1 WAyA” kvk20 .
We have kAvk20 = hAv Avi0 = hAyAv vi0 (2.9) v 1 i0 = 0 6 sup_ 2Sp AyA _ kvk20 eq. and h _<1 = „1 WAyA” kvk20
the latter by definition of the spectral gap, which proves the general case. If the matrix A is moreover reversible, then we have AyA = A2, and so 1 WAyA = „1 WA”2, which gives eq. (2.11).
An additional feature of interest to us is that reversible ergodic ma-trices admit reasonable bounds over their spectral gap. We borrow the following spectral bound from [29, Corollary 7], which generalizes a previous bound given by Nedić et al. [78, Lemma 9].
Lemma 2.3.3. For a reversible ergodic matrix A 2 R= =, WA > U „A” (2.12) = 1
where U „A” B min f c8 „A” 8 9 j 8 9 2 »=… g n f 0 g. l
Metropolis and others In the theory of finite Markov chains, a stochas-tic matrix A has an interpretation as a transition matrix of a Markov chain – „C” for which P»- „C ‚ 1” = 9 j – „C” = 8… = 8 9 . If the matrix A is re-versible and ergodic, then the Markov chain that it describes is reversible and ergodic as well for the traditional meaning given to those terms in the theory of Markov chain; the Perron vector 0 = 0 „A” then gives the unique stationary distribution of the corresponding Markov chain.
In their seminal work [70], Metropolis et al. introduced the method of Monte Carlo sampling for the fast approximation of parameters that are hard to compute exactly. Their algorithm, later generalized by Hastings [53], consists at the core of turning a reversible ergodic matrix A with arbitrary Perron vector 0 into another matrix A0 with a chosen Perron vector 0 0 by:
80 ( „ 8 9 c90 98 ”
9 = 1 80 9 = 8 (2.13)
min c 9 < 8
P:<8 8:0
in a method now known as the Metropolis-Hastings algorithm.6 Here, we shall be interested mostly in the case where 0 0 is the constant vector „ 1 1 ”˙, in which case the resulting matrix A0 is symmetric.

The EqualNeighbor and Metropolis update rules EqualNeighbor

The prototypical example of an update rule is the EqualNeighbor rule, where the next estimate of an agent is the unweighted average of those of its incoming neighbors:
G8 „C” = 1 92X8„ ” „C 1” (2.14)
38 „ C ”
In C G9
This rule directly translates into an implementing local algorithm: an agent broadcasts its latest estimate in every round, and picks for new estimate the average of its incoming values. Since an agent’s degree is at most =, the EqualNeighbor rule is uniformly convex with parameter U = 1 =, and thus solves consensus over the class G.
However, the convergence time is poor over the class G, where [84, Proposition 12] gives a lower bound of T„A ; = ” > 2 „=” log 1 A . The conver-gence time is improved when the network displays additional structure: over the sub-class of G of fixed graphs, Olshevsky and Tsitsiklis [83, Corol-lary 5.2] use a result from Landau and Odlyzko to show a tight bound in O„=3 log = A ”. This bound extends to the sub-class G of dynamic graphs for which each vertex has a fixed degree [40, Theorem 1.6].
Local update rules such as eq. (2.1) admit an equivalent matrix form. For the EqualNeighbor rule, as an example, eq. (2.14) is equivalent to the global rule x „C” = W „G„C””x „C 1”, with the EqualNeighbor matrix W „ ” given for any graph by »W„ ”…89 = ( 0 „ ” 9 8 In8 (2.15) 1 38 9 2 In8 „ ” „ ”
We note that this matrix is stochastic for any graph , and has for associated graph. 6 Or, as perhaps unsur-prisingly nobody calls it, the Metropolis-
Rosenbluth-Rosenbluth-Teller-Teller-Hastings algorithm.
Since the Metropolis-* algorithm is not an “algorithm” in the sense we use in this thesis, and since Metropolis and Hastings are not wanting in fame, we shall call the Metropolis-Hastings symmetrization the transformation A 7!M„A” given by
» „A”…8 9 = ( 1 „ :<8 min 8: :8 9 = 8 (2.16)
M min 8 9 98 ” ” 9 < 8 P „
By construction, the matrix M„A” is symmetric and leaves consensus vectors invariant. Outside the main diagonal, we have »M„A”…8 9 6 8 9 , and thus on the main diagonal we have »M„A”…88 > 88 ; the matrix M„A” is therefore doubly stochastic whenever the matrix A is stochastic.
Xiao et al. [100] propose to approach the average consensus problem with the Metropolis update rule:
92X„ ” G9 „C 1” G8 „C 1” (2.17)
„ „ C 1 ” 39 „ C 1 ””
G8„C” = G8„C 1” ‚ max 38 ;
when viewed at the scale of the system, this rule corresponds to sym-metrizing the EqualNeighbor rule with the Metropolis-Hastings sym-metrization round-wise, hence its name. Proposition 2.2.1 ensures asymp-totic consensus of the Metropolis rule over the entire class G as it did for the EqualNeighbor rule, and since the EqualNeighbor matrices are stochastic, the Metropolis matrices are doubly stochastic, and the Metropolis rule results in fact in average consensus, with a convergence time in O„=2 log = A ” over the class G.
Unfortunately, no local algorithm is able to implement the Metropo-lis rule over the class G: the rule is local only in the weak sense that an agent’s next estimate G8 „C” depends on information present within dis-tance 2 of agent 8 in round C, which is not local enough when the network is subject to change.
Indeed, since agent 9 only knows its round C degree 39 „C” at the end of round C, it has to wait until round C ‚ 1 to share this information with its neighbors. Any distributed implementation of this rule would require the communication links to evolve at a slow and regular pace. As an example, we may assume that the links only change at rounds C for which C A mod : – e.g., at even rounds. Such conditions are all the more limitative in that they additionally require all agents to be loosely synchronized, as they have to agree on : and on the current round number – at least modulo :.
The situation is even worse when the network is subject to unpre-dictable changes, as we need to warn all agents, ahead of time, about any upcoming topology change. In effect, this amounts to having a global synchronizing signal precede every change in the communication topol-ogy. For a topology changing in round C0, this differs little from starting an entirely new execution with „G1 „C0 1” G= „C0 1”” for new input.
To paraphrase, provided the dynamic communication graph is suffi-ciently stable, one “can” implement the Metropolis rule over dynamic networks, but the execution is fully distributed only as long as no change occurs.

READ  Learning with Label Noise 

Degree tracking for stabilizing weights

Consider the update rule given by 1 X (2.18) G8 „C” = G8 „C 1” ‚ @8 9 2N8 „C” „G9 „C 1” G8 „C 1””
which we call the FixedWeight rule for the parameters @1 @= ¡ 0. When 38 „C” 6 @8 , it acts as a sort of sluggish version of the EqualNeigh-bor rule, where agent 8 gives more importance to its own estimate G8 „C 1” than to those of its neighbors. Over the sub-class C Gj= of dynamic graphs for which 38 „C” 6 @8 holds for all vertices at all times, the Fixed-Weight rule with parameters @1 @= is shown in [40, Theorem 1.6] to solve consensus with a convergence time in TC„A ; =” = O„P8 @8 = log = A ”. In particular, using @1 = @2 = = @= = = yields a bound in O„=3 log = A ”, comparable to the EqualNeighbor rule over static networks.
Unfortunately, the uniform programming of the agents in our model precludes the distributed implementation of this rule, which would require to individually communicate its parameter @8 to each agent 8. Moreover, even if we have the capacity to do so, or if we batch program all agents with the same parameter @, our ability to pick good values for these parameters is limited by what is known, ahead of time, about the structure of the communication graph: parameters that are too small risk breaking the degree condition 38 „C” 6 @8 and cause the system to diverge, while parameters that are too large make convergence unacceptably slow if the network has a small degree.
Instead of relying on exogenous parameters, incompatible with a distributed approach, we propose making the agent learn by them-selves what values of @1 @= work for the current execution. We call MaxWeight the rule obtained by replacing the fixed parameter @8 with 380„C” B max f 38 „1” 38 „C” g at each step of eq. (2.18).
As each sequence 380„C” stabilizes over its limit 380 B maxC 38 „C”, the MaxWeight rule eventually behaves like the FixedWeight rule with parameters 310 3=0. However, the MaxWeight rule can be imple-mented over the class G with no additional assumption, and we can show the following about the resulting MaxWeight algorithm, given in Algorithm 3.

Table of contents :

1 Computation in dynamic networks 
1.1 Introduction
1.2 Relations, graphs, dynamic graphs
1.3 Distributed model
1.4 Big-step operational semantics
1.5 Computational tasks
2 A little learning goes a long way 
2.1 Introduction
2.2 Affine update rules
2.3 Euclidian geometries for update rules
2.4 The EqualNeighbor and Metropolis update rules
2.5 Degree tracking for stabilizing weights
2.6 An affine algorithm
3 Randomization and Quantization 
3.1 Introduction
3.2 Preliminaries
3.3 Randomized algorithm
3.4 Quantization
3.5 Decision
3.6 Simulations
Parting thoughts


Related Posts