# Improved compact formulations for sparse graphs

Get Complete Project Material File(s) Now! »

## A more general model: Graph partitioning problem under set constraints (GPP-SC)

We now introduce a more general model denoted GPP-SC (where SC stands for « set constraints ») which encompasses GPKC and GPCC as special cases. GPP-SC can be mathematically dened as follows. Given an undirected connected graph G = (V;E) where V = f1; : : : ; ng, jEj = m and a length le 2 Z+ is associated with each edge e 2 E, nd a partition of V into disjoint sets (or clusters) such that:
every cluster C V satises a constraint of the form G(yC) 0 where yC is the incidence vector of C in f0; 1gn and G : f0; 1gn ! R is a given monotone nondecreasing pseudoboolean function (Note that G can also be viewed as a set function: P(V ) ! R which associates the real value G(yC) with each subset C V ).
the sum of the lengths of the edges having end-nodes in dierent clusters is minimized.
Note that in the above denition, the number k of clusters is not imposed and is a part of the output of the optimization process. The above class of problems is fairly general. In mathematical form, the task allocation problem can be modeled as a special case of GPP-SC where each cluster C is required to satisfy a node weight constraint of the form P v2C wv W where wv for all v 2 V are given nonnegative node weights and W is a given upper limit on the total node weight of the cluster. This corresponds to considering the linear constraint G(yC) 0 where G is the nondecreasing pseudoboolean function dened by X v2V wvyC v 􀀀W 0.

### Discussion of the graph partitioning formulations

In this section, we give a brief discussion of the above graph partitioning formulations. We point out advantages and also drawback of each formulation as compared with each other.
The Node-Node model contains linear form objective, there are not symmetric. However, there are a huge number of triangle inequalities (O(n3)).
The Node-Cluster model is highly symmetric, moreover the objective function in this model is in the quadratic form requiring some linearization techniques to apply in a LP solver. The authors in [11] shows that the Node-Node model outperforms the Node-Clusters model in term of the quality of the continuous relaxation.  The mixed model of Node-Node and Node-Cluster model has the objective function also in the linear form. However, as same as the Node-Cluster model, this model is highly symmetric and that make use of more variables than Node-Node and Node-Cluster model.
The semi-denite formulation contains an additional semi-denite constraint as compared with the Node-Node model. Moreover, the semi-denite programming still can not be compared with linear programming in term of computation times. The authors in [46] report that the semi-denite formulation is better than the Node- Node model with a few number of clusters, otherwise, when the number of clusters is bigger, the Node-Node model is preferable. In all the investigation of the present work, we are interested only on the Node- Node formulations.

#### Extension to the case when the number of clusters is bounded from above

In this section, it will be shown that part of the above results can be extended to the case when the number of clusters is bounded from above by a given constant. Note that the limitation on the number of clusters is somewhat con icting with the monotone nondecreasing property of the cluster constraints in GPP-SC due to the fact that the number of clusters will increase as the number of nodes in the clusters is decreased. Therefore Theorem 3.2.3 seems to be non applicable in this case. Nevertheless, we prove in the following that the result of Corollary 3.2.4 is still applicable. To achieve this and exploiting some analogies with the analysis in [1], we introduce a vector z 2 f0; 1gjV j that is dened as follows: zi = 1 if and only if i is the smallest node index in the cluster that contains i, in this case the node i is called a representative. Therefore z can be computed from x as, for all i 2 V .

Compared eciency with and without upper bound on the number of clusters

Computation experiments are also made to compare the eciency of solving GPKC with and without upper bound on the number of clusters. To highlight the dierences between them, the experiments are done in the following way. For any GPKC instance i, we rst solve (RIPKC), the corresponding number of clusters of the optimal solution is denoted k i . We then solve (kRIPKC) (i.e. (kRIP) applying on GPKC instances) for the same instance i in adding the upper bound on the number of clusters ki. It is obvious that if ki k i , the solutions of (RIPKC) and (kRIPKC) are identical. Computational results are shown in Table 3.2 for ki = k i 􀀀 1 and ki = k i 􀀀 2 . For each value of n;m, ve instances are solved. The third and the fourth columns in the table report the CPU time to obtain the optimal solution using (RIPKC) and the number of clusters in the optimal solution, the fth and the sixth columns report the CPU time to obtain the optimal solution using ((k- 1)RIPKC) and ((k-2)RIPKC) respectively. The acronym « NF » indicates a nonfeasible instance, and the computation time to prove the infeasibility is shown in parenthesis.
As can be seen from Table 3.2, while all instances can be solved for (RIPKC), there are only part of instances can be solved for (kRIPKC) within the time limit of 12000 seconds. Moreover, for the instances that can be solved for both models, (kRIPKC) is from 3 to 6 times slower than (RIPKC). A possible explanation of this is as follows.
From the results in Table 3.2, it is seen that for all instances considered, the number of clusters k in the optimal solution to (RIPKC) is close to its minimum possible value. Indeed, for k = k 􀀀 1, about 50% of the instances become infeasible, and for k = k 􀀀 2, about 90% of the instances turn out to be infeasible. The observed increase in the computation times is thus due at least in part, to the fact that, when decreasing k, the instances become close to the boundary between feasibility and infeasibility.

READ  The effect of inverting noise on signal identification

Eliminating constraints featuring largest slacks

Cutting-plane is a method frequently used to solve large linear problems featuring huge number of constraints. Experiments show that, using a cutting-plane algorithm on a linear formulation, the solvers make use of fewer constraints, as compared with the complete formulation. Practically, the violated constraints are determined in each iteration and part of them (or all of them) are integrated to the next iteration.
By doing so, we expect that only necessary constraints are used to solve the problem. However, experiments show that, many unnecessary constraints are also included in the course of the computations. We suggest here a method to remove as much as possible the unnecessary constraints in each iteration and then limit the number of constraints.
We focus our improvement on the knapsack constraints. The main idea being to ensure that the number of knapsack constraints never exceeds km for same parameter k, typically chosen in the range [3; 5]. This is achieved by applying the following simple rule : if, at some stage of the procedure, p knapsack constraints are already present in the current formulation, and q violated inequalities are generated, then if p + q k m then p + q 􀀀 k m constraints having largest slacks are deleted.

Truncated shortest path computations

We provided a polynomial separation method for LCP using the computation of all pair shortest path of G with respect to weights x. Although the all pair shortest path can be computed in polynomial time, but when repeating this algorithm in each iteration of the separation procedure, this may cause of large solution time for the model. Computation results in Section 4.4 show that the shortest path computation times are accounted for about 2/3 of the total computation times. Reducing the computation time of shortest path algorithm is thus a necessity.

1 Introduction
1.1 Research context
1.1.1 Graph partitioning problem under knapsack constraints (GPKC)
1.1.2 Graph partitioning under capacity constraints (GPCC)
1.2 Graph partitioning problem under set constraints
1.3 Motivation and Contributions
2 Literature review
2.1 Fundamental concepts
2.1.1 Graph theory
2.1.2 Graph partitioning and multicuts
2.1.3 Problems, algorithms and complexity
2.2 Overview of models and solution methods
2.2.1 Node-Node formulations
2.2.2 Node-Cluster formulation
2.2.3 Approaches of mixing Node-Node and Node-Cluster models .
2.2.4 Semi-denite formulation
2.2.5 Discussion of the graph partitioning formulations
3 Improved compact formulations for sparse graphs
3.1 Basic 0/1 programming model for GPP-SC
3.2 Improved 0/1 programming model for GPP-SC
3.3 Extension
3.4 Computational experiments
3.4.1 Experimental results for GPKC
3.4.2 Experimental results for GPCC
3.5 Conclusions
4 Cutting plane approach for large size graph
4.1 Preliminary: Node-Node formulation for GPKC
4.2 A m-variable formulation for GPKC and its solution via cutting-planes
4.2.1 A m-variables formulation for GPKC
4.2.2 Solving the separation subproblem via shortest path computations
4.2.3 Ecient implementation of the cutting plane algorithm .
4.3 Ecient computation of upper bounds: heuristic solutions
4.3.1 Building feasible partitions: upper rounding procedure (UR) .
4.3.2 Building feasible partitions: progressive aggregation procedure (PA)
4.4 Numerical experiments
4.4.1 Experiments for the cycle model using the cutting plane algorithms
4.4.2 Experiments for the cycle model using ecient implementation of the cutting plane algorithm
4.4.3 GPKC upper bound computation
4.4.4 Convergence prole of the cutting plane algorithm
4.5 Conclusion
5 Stochastic graph partitioning
5.1 Stochastic programming
5.1.1 Optimization under uncertainty, an overview
5.1.2 Chance constrained programming
5.1.3 Convexity studies
5.1.4 Stochastic graph partitioning
5.1.5 Stochastic graph partitioning under knapsack constraint formulation
5.2 Partitioning process networks
5.3 Second order cone formulation