Work stealing strategies for multi-core IVM-based B&B

Get Complete Project Material File(s) Now! »

Operations based on decimal system

This subsection explains the used methods, in our work, to perform the division operation in the factorial system. Unlike other operations, the division is hardly feasible directly in the factorial system. Another way to do this operation is to convert the dividend and the divisor to the decimal system, to perform the division in the decimal system, and thereafter to convert the result to the factorial system. So this subsection explains how to convert a factorial number to a decimal number, and how to convert a decimal number to a factorial number.
When converting a decimal number into its factorial representation, digits are produced from right to left. This conversion consists in repeatedly dividing the number by the radixes 1, 2, 3, etc. After each division, the remainder should be considered as the digit. The division operation continues with the integer quotient until this quotient becomes 0. Let assume a decimal number D = 349 to convert into a factorial number. This conversion is done using successive Euclidean division as shown in Equation (2.10).
349=1 349+0
349=2 174+1
174=3 58+0
58=4 14+2
14=5 2+4
2=6 0+2
Euclidean division is the operation of division of two integers, which produces a quo-tient and a remainder. Each line of Equation (2.10) represents an Euclidean division Qi = i Qi+1 + Ri such as: Qi+1 is the quotient of the division Qii Ri is the reminder of this division Q1 = A is the decimal number to convert Qn+1 is always equal to zero and is the last quotient R1 is always equal to zero and is the first remainder.

Permutation to factorial number

When encoding and decoding a permutation, it is important that the used code has to be compact and the coding and decoding operations have to be fast. With a set of n elements, it is possible to get n! permutations, and from 0 to (n! 1), there are n numbers. So the idea is to associate for each of these permutations one and only one number. In other words, the objective is to encode each permutation with one and only one number. There are two main methods, almost equivalent, which are used to code a permutation as a factorial number. The first is based on the Lehmer code (explainted in this section) and the second on the inversion table (explained in Appendix B). Lehmer code, introduced by Derrick Lehmer [Lehmer 1960], allows to encode a per-mutation as a factorial number. The example of Table 2.3 shows how to transform a permutation = 0 1::: 8 in order to obtain its Lehmer code L = (L0L1:::L8)!. This exam-ple assumes that the permutation is equal to 527038614. In the figures of this table, the elements of the permutation are represented by black circles, positions by blue triangles topped with a bow, and the obtained Lehmer code by a red rectangle. Each of the nine cells of the table describes the obtained value Li of a Lehmer code position. As shown in the first cell of the table, to obtain the first code L0, it is necessary to count the number of inversions where the element 0 = 5 appears on the left side of a pair. In this example and as shown in Equation (2.26), there are 5 inversions of this type. This means that L0 is equal to 5. Inversions(5; 41032) = f(5; 2); (5; 7); (5; 0); (5; 3); (5; 4)g (2.26).
In the same way, L1 is equal to 2 since 1 is equal to 2 and there are 2 pairs of inversions where 2 appears at the left side of the pair. It is possible also to deduce the other values Li as shown in Equation (2.27). Therefore, the Lehmer code of the permutation = 527038614 is equal to L = (525013200)!. Li = jInversions( i; )j (2.27).
The figure of the first cell shows that the number of inversions associated to 0 can not exceed 8 since there are only 8 elements at the right of 0. In the same way, the number of inversions associated to 1 can not exceed 7. Those associated to 2 can not exceed 6, etc. So the general rule stipulates that the associated number of inversions for i can not exceed 8 i. Since the value Li is always smaller than 8 i, the obtained Lehmer code is a factorial code.
Algorithm 6 describes the used method for obtaining the Lehmer code of any permu-tation. This algorithm has a complexity O(n log(n)) and contains two loops. The outer loop allows the algorithm to point the code i to compute, and the inner loop counts the number of inversions where the element i appears at the left side of an inversion of a permutation .

Factorial number to permutation

This algorithm does not operate on the vector of positions but on the vector of elements. In this algorithm, the Lehmer code L0 is decoded first to find 0, then L1 to find 1, …, until decoding L8 to find 8. In other words, decoding Li allows to find the element i of the permutation . At the beginning, L0 is equal to 5. The decoding is thus performed by taking the element that is located at position 5 of the vector of elements, put this element at position 0 of the permutation , and shift with one position to the left all the elements which are at the right of the position 5 of the vector of elements. Decoding a code Li consists therefore to take the element which is located at position i of the vector of elements, put this element at the position i of the permutation , and shift with one position to the left all the elements which are at the right of the position i of the vector of elements. Decoding the Lehmer code L = (525013200)! provides therefore the permutation = 527038614.

Conventional serial B&B

Before introducing our new approach based on IVM, this section reminds to the reader how a conventional B&B works using LL data structure. The section is divided into three subsections. The first subsection provides a general overview about the B&B algorithm, the second subsection explains the role of the pool based on LL data structure, and the third subsection details the four operators of this algorithm.

Algorithm description

Several exact resolution methods used in combinatorial optimization are Branch-and-Bound (B&B)-like algorithms. These methods are mainly divided into three basic vari-ants: simple B&B, Branch-and-Cut (B&C), and Branch-and-Price (B&P). There are other B&B variants less known such as Branch-and-Peg [Goldengorin 2004], Branch-and-Win [Pastor 2004], and Branch-and-Cut-and-Solve [Climer 2006]. This list is certainly not ex-haustive. It is also possible to consider divide-and-conquer algorithm as a B&B algorithm. It is enough to remove the pruning operator, explained below, from the B&B to get a divide-and-conquer algorithm. Some authors consider B&C, B&P, and the other variants as different algorithms from B&B. These authors use B&X to refer to algorithms like B&B, B&C, B&P, etc. In what follows, B&B algorithm refers to simple B&B or any other variant of this algorithm. This algorithm is based on an implicit enumeration of all the solutions of the problem being solved. The space of potential solutions (search space) is explored by dynamically building a tree where: The root node represents the initial problem to solve.
The leaf nodes are the possible solutions of this initial problem.
And the internal nodes are subspaces of the total search space. Internal nodes can be also considered as subproblems of the initial problem. The size of the subspaces is smaller and smaller as one gets closer and closer to the leaves. The construction of such a tree and its exploration are performed using four operators: branching, bounding, selection and pruning. The algorithm proceeds in several iterations. The best solution found is saved and can be improved from an iteration to another. Subproblems generated and not yet processed are kept in a pool. In the beginning, this pool contains the initial problem. The LL version of the Branch-and-Bound algorithm is shown in Algorithm 8. The pool of Figure 3.1 is represented as a tree in order to visualize the problem/subproblem relationships between nodes, and as a matrix to facilitate the comparison with the IVM-based approach described in Section 3.3. In our work, this pool is implemented using a deque (head-tail linked-list), referred to as LL, as shown in Figure 3.2. In Figure 3.1, for instance, the node 24/13 means that job 2 is scheduled at the first position, job 4 at the second position, and jobs 1 and 3 are not yet scheduled. In this figure, strike-through nodes represent subproblems which are added into LL and selected from it. At each B&B iteration, the algorithm points to a node of the B&B pool. In the example of Figure 3.1, the algorithm is currently pointing to the solution 2314/. Therefore, Figure 3.2 represents the state of the pool just before removing 2314/. Before selecting this solution, LL contains five nodes, namely 3/124, 4/123, 24/13, 234/1 and 2314/. Before having LL in this state, some operations are applied. At the beginning of the B&B, none of the four jobs is scheduled (i.e. /1234). The node /1234 is branched/decomposed into four nodes which are 1/234, 2/134, 3/124 and 4/123. In each of these nodes, one job is scheduled and the three other jobs are not yet scheduled. This example assumes that the first node 1/234 is processed or pruned, and the algorithm branches the second node 2/134. The decomposition of this node generates three nodes, namely 21/34, 23/14 and 24/13. The example also assumes that the first node 21/34 is processed or pruned. Therefore, the algorithm decomposes the second node 23/14, and obtains two new nodes which are 231/4 and 234/1. The node 231/4 represents a simple subproblem and accepts only one solution 2314/.

IVM-based serial B&B

The use of LL to store a pool of subproblems described in Subsection 3.2.2 has a number of disadvantages. The insertion of new subproblems in LL can be very costly, since it implies finding the right position in LL for each subproblem, which involves the comparison of each subproblem to the ones already present in LL until its place is found. Considering that LL can grow and get very large, the branching operator can take a significant amount of time due to the insertions in LL. Another problem to consider is the memory usage of LL, which can grow rapidly. The use of a Depth First Search approach to explore the tree can make this problem less significant than in a Breadth First Search, but LL can still grow rapidly for problems of a significant size. It is therefore necessary to create a new data structure which exhibits a better behavior in the management of the subproblems of a permutation problem. The IVM (Integer Vector Matrix) data structure allows to store and manage subproblems in a more efficient way than LL. It uses a constant amount of memory which makes its behavior much more predictable than LL. Its branching operator is also less costly than the one described in Subsection 3.2.2. It is however less generic than the LL data structure when used in the B&B algorithm. It can only be used for permutation problems and only allows for a Depth First Search exploration of the tree of subproblems.

Table of contents :

1 Introduction
2 Permutation-based optimization problems
2.1 Introduction
2.2 Permutation optimization problems
2.2.1 Traveling-salesman problem
2.2.2 Quadratic assignment problem
2.2.3 Job-Shop problem
2.2.4 Permutation Flow-Shop problem
2.3 Factorial number system
2.3.1 Definition of number system
2.3.2 Definition of factorial number system
2.3.3 Operations based on decimal system
2.3.4 Operations based on factorial system
2.4 Handling permutations with factorial numbers
2.4.1 Basic concepts
2.4.2 Permutation to factorial number
2.4.3 Factorial number to permutation
2.5 Conclusion
3 Serial IVM-based B&B
3.1 Introduction
3.2 Conventional serial B&B
3.2.1 Algorithm description
3.2.2 LL data structure
3.2.3 B&B operators
3.3 IVM-based serial B&B
3.3.1 IVM data structure
3.3.2 IVM advanced techniques
3.3.3 Revisited B&B operators
3.4 Experimentation
3.4.1 Experimental settings
3.4.2 Memory evaluation
3.4.3 CPU Time evaluation
3.5 Conclusion
4 Multi-core IVM-based B&B
4.1 Introduction
4.2 Parallel models for B&B algorithms
4.2.1 Multi-parametric parallel model
4.2.2 Parallel evaluation of bounds model
4.2.3 Parallel evaluation of a bound model
4.2.4 Parallel tree exploration model
4.3 Work stealing strategies for multi-core IVM-based B&B
4.3.1 WS-based B&B implementation
4.3.2 Coalesced work units
4.3.3 Dividing one factoradic interval into two intervals
4.3.4 Victim selection and granularity policies
4.4 Experimentation
4.4.1 Experimental settings
4.4.2 Strategy and granularity policies evaluation
4.4.3 Memory evaluation
4.4.4 CPU time evaluation
4.5 Conclusion
5 Many-core IVM-based B&B
5.1 Introduction
5.2 Coprocessor-accelerated B&B: the general design
5.3 GPU-based implementation of B&B
5.3.1 Parallelization on GPU
5.3.2 Parallelization of B&B for GPU
5.4 MIC-based implementation of B&B
5.4.1 Parallelization on Intel Xeon Phi
5.4.2 Parallelization of B&B for Intel Xeon Phi
5.5 Experimentation
5.5.1 Hardware and software testbed and parameter setting
5.5.2 Experimental results
5.6 Conclusion
6 Conclusions and perspectives
Bibliography