Get Complete Project Material File(s) Now! »

## An overview of mathematical programming

**Introduction**

A decision problem P encodes a formal YES/NO question, parametrized by a set of input and output symbols which can be assigned values. The assignment of values to the input symbols is an instance, and decision problems are assumed to have an infinite number of instances. The assignment of values to the output symbols is a solution of P .

Mathematical optimization is a field of mathematics [65] related to optimization problems, whereby a solution is selected from a feasible (i.e., admissible) set of out-put assignments. The interest in mathematical optimization comes from the fact that countless practical applications, arising from diverse fields such as engineering, finance, economics, chemistry, computer science, etc., can be cast as optimization problems.

MP is a formal language for describing optimization problems and a framework for solving them. Each formal MP sentence, called formulation, is recursively built from a set of entities: index sets and parameters (encoding the problem input), de-cision variables (encoding the solution for a specific instance), objective function (for assessing variable assignments), and constraints of various types (defining the avail-able decisions). In the following, we will be adopting the abbreviation “MP” both for “mathematical programming” (the language/framework) and “mathematical program” (an MP formulation), whenever the context allows for unambiguous use of the term.

This chapter is devoted to providing some basic notions about MP. Aside from the fact that Ph.D. theses are required to contain an introduction to whatever subject they are about, we are further motivated by three observations. Firstly, we see both the ACP and the DGP as optimization problems: the former finds the configuration offering the best algorithmic performance, the latter searches for a realization of a graph in a given Euclidean space. Further, the methodologies proposed by the authors of this Ph. D. thesis to address the ACP and the DGP adopt MP algorithms to try to solve them; in this chapter, we discuss some of these algorithms. In particular, our ACP approaches involve training ML predictors, which requires solving optimization problems. Secondly, we study the application of the ACP to solution algorithms for MP. From the point of view of MP as a formal language, we use the MP syntax (the formulations) in order to represent the ACP; and we apply the ACP to the algorithms that provide the semantics of MP (i.e., solvers, deciding the assignment of optimal values to the decision variables).

### The components of a mathematical program

The MP formulation of an optimization problem is as follows:

min f(x)

x D (2.1)

∀i ∈ I gi(x) ≤ 0

∀j ∈ E hj(x) = 0 ,

whereby |I| = m and |E| = m′, f : Rd −→ R , gi : Rd −→ Rd and hj : Rd −→ Rd. The formulation in Eq. (2.1) is the standard form of an MP. Its domain is the closed set

D = dom f ∩

dom gi ∩

dom hj .

Throughout this chapter, we assume that we are always optimizing over closed sets, and will always refer to D as the domain of an MP formulation.

The components of the problem in Eq. (2.1) are [115]:

• decision variables: a set of quantities x, representing decisions to be taken and encoding an output solution;

• constraints: a set of statements defining the domain and feasible values of the variables. The constraints of an MP formulation are of two types: explicit and implicit. Explicit constraints are written by means of arithmetical expressions of variables, parameters, equality/inequality symbols and/or optimization oper-ators (e.g., “2×1 − 3×42 = 0”). Implicit constraints are statements expressed as membership in certain classes of sets (e.g., “x ∈ Z”);

• objective function: a couple, containing an optimization operator (i.e., min or max) and an arithmetical expression f(x) of variables, parameters and/or optimization operators. An optimization operator defines the optimization direction: it specifies whether an expression must be minimized or maximized. The purpose of the objective function is to allow the assessment of different value assignments to variables, in order to determine an optimal one;

• parameters: a set of given input coefficients, univocally identifying a problem instance. In Eq. (2.1), the parameters are: a function f, the finite index sets of the inequality and equality constraints I, E the constraint functions gi, for all i ∈ I and hj, for all j ∈ E;

A point xˆ is feasible with respect to the problem in Eq. (2.1) if it satisfies all the constraints, for given input parameters. We let F = {x ∈ D | ∀i ∈ I gi(x) ≤ 0, ∀j ∈ E hj(x) = 0} (2.2) be the feasible set of Eq. (2.1), containing the points satisfying all the constraints, so that Eq. (2.1) can be rewritten, compactly, as min f(x) . x F

If F = ∅, we say that the MP in Eq. (2.1) is infeasible. Instead, we say that Eq. (2.1) is unbounded if, for all objective function values m, there always exists a point x′ ∈ F such that f(x′) < m. If an MP is not infeasible nor unbounded, and there an optimal solution, we call p∗ the optimal value of that solution. We remark that symbols denoting optimal objective function values might be assigned two special symbols, ∅ or ∞, which means that the underlying MP is, respectively, unbounded or infeasible.

For a feasible solution xˆ of Eq. (2.1) to be a local optimum, xˆ must be such that there exist ε > 0 and a neighbourhood Uε(ˆx) ∩ F where, for all x ∈ Uε(ˆx), f(ˆx) ≤ f(x). If, instead, f(ˆx) ≤ f(x) for all x ∈ F, then xˆ is also a global minimum of Eq. (2.1).

In MP, it is common to resort to reformulation techniques. A reformulation Q of a problem P is an auxiliary problem having some properties in common with P . For a detailed account of reformulations, we refer to [114]. Reformulations are customarily applied in MP to cast the original problem into one that is easier to solve. In fact, the formulation adopted for Q can capture and encode some useful structural properties of P , which can then be exploited by appropriate solution algorithms. Q may: a) preserve all or some of the (local/global) optimality properties of P , b) be obtained from P by removing some of its constraints (relaxations), or c) approximate some of the components of P (approximations, which may or may not supply optimality guarantees).

Relaxations and approximations are especially important, because they yield bounds on the optimal value of the objective function of P . These bounds can then be exploited by optimization algorithms — such as the Branch-and-Bound (B&B) [107] — to solve diverse problems in several different MP classes, such as Mixed-Integer Linear Program-ming (MILP), Nonlinear Programming (NLP) and Mixed-Integer Nonlinear Program-ming (MINLP). Another widely used reformulation is the “dual” Q of an MP P (here, the “primal”). In the words of [118], the dual Q is such that the “decision variables of P are used to index the constraints of Q, and the constraints of P are used to index the variables of Q”. Solving the dual is also conductive to the computation of the duality gap. We let pˆ the primal objective function value at a primal-feasible point xˆ, qˆ the dual objective function value at a dual-feasible point yˆ, and we define the duality gap as ∥pˆ − qˆ∥ ≥ 0 . (2.3)

Moreover, for any primal-dual feasible couple, duality provides a bound on the primal formulation, a property called weak duality. For minimization problems such as the one formulated by Eq. 2.1, weak duality corresponds to: qˆ − pˆ ≤ 0 , (2.4)

i.e., qˆ is a lower bound on any primal solution value. For maximization problems, solving the dual formulation supplies an upper bound on primal solution values. Unboundedness of the dual (sometimes represented as qˆ = +∞), implies infeasibility of the primal, and viceversa.

There are several algorithms for solving optimization problems in one or more MP classes. Solvers are software applications implementing one or more of such algorithms. They accept a problem instance and, possibly, a formulation of the problem and specific operating instructions (conditioning their behaviour) as inputs. Their outputs are: a) an optimal feasible solution (or a set thereof), which solvers attempt to compute for the given input; b) possibly, information about the solution process. Solvers provide the semantics of MP formulations, by deciding the optimal value assignment values to the decision variables.

Optimization algorithms usually start from a (possibly, feasible) point, then itera-tively explore the feasible set, searching for a point capable of improving on solution value. Many of these algorithms, especially those capable of providing optimality guar-antees, depend on the gap in Eq. (2.3) to assess the improvement at each iteration. In fact, their termination criterion is usually ∥pˆ − qˆ∥ ≤ ϵ , where ϵ > 0 is a tolerance parameter; in this case, we say that an ϵ-optimal solution has been attained. Strong duality applies when the gap in Eq. (2.3) is zero, which occurs at any globally optimal primal-dual solution.

In the following sections, we give an overview of some important classes of MP. Since an extensive treatment of them is beyond the scope of this manuscript, we will avoid any proofs and provide bibliographic references for further study.

#### Convex Programming

Solving an MP is, in general, difficult: it has been shown that MP is uncomputable (as a class) [97, 116] and that even those MPs that are decidable are still NP-hard. However for at least one MP class, Convex Programming (CP), it is always possible to guarantee the existence of a solution with certain desirable properties. Some problems in this class are still NP-hard [48, 160], but many others are tractable. Notably, although exact solution algorithms running in polynomial time only exist for two CP classes — namely, Linear Programming (LP) (presented in Sec. 2.5) and Quadratic Programming (QP, i.e., minimizing a quadratic function over linear constraints) [29, §4] — it is always possible to compute an approximate solution of a CP in polytime [148].

The MP defined in Eq. (2.1) is a CP if f and F are convex. For this to be true, it is sufficient that [150,§12.4]: a) for all j ∈ E, hj(x) := a⊤jx + bj, i.e., the equality constraints are affine (hence, convex); b) for all i ∈ I, gi(x) are convex. The main property of a CP is the equivalence of local and global optimality: any local minimum is also a global minimum. For this reason, local algorithms for convex optimization are guaranteed to work globally.

The reason why CP is important for this thesis is that many optimization problems can be formulated as CPs [29], including many arising in ML such as support vector machines, logistic regression, ridge regression, and others [33]. Moreover, some well-known DGP formulations are convex; see, e.g., Eq. (7.6) in Sec. 7.4. The interested reader will find further details in textbooks such as, say, [29] or [146].

**Lagrangian duality and optimality conditions**

We would like to know under which conditions the gap in Eq. (2.3) is zero, i.e., when strong duality applies. Lagrangian duality (see, e.g., [20, §6]) allows us to derive these conditions: it is a reformulation technique for transforming constrained optimization problems into (partially) unconstrained ones, by placing the constraints in the objective function and penalizing any violations to them.

We apply Lagrangian duality to Eq. (2.1).

The first step is to build the Lagrangian function, i.e., the map L : Rd × Rm × Rm′ −→ R L(x, λ, µ) = f(x) + λigi(x) + µjhj(x) , (2.5) i I′ j E′

where E′ ⊆ E, I′ ⊆ I, the coefficients{λi}i I′ and {µj}j E′ are the Lagrangian multi-pliers and, for all i ∈ I′, λi ≥ 0.

Then, we define the Lagrangian dual function as the concave map q : Rm × Rm′ −→ R (2.6) q(λ, µ) = inf L(x, λ, µ) . x F

Eq. (2.6) is the Lagrangian relaxation of the MP in Eq. (2.1), providing a lower bound on its optimal value p∗.

Eq. (2.8) allow us to make some statements about the optimality of Eq. (2.1). Firstly, if there exists a primal feasible point x∗ and dual feasible Lagrangian multipliers (λ∗, µ∗) at which strong duality holds, then the KKT conditions are satisfied at those points (necessity). Secondly, if there exists primal-dual feasible points x∗, λ∗, µ∗ satisfying the KKT conditions in Eq. (2.8), then x∗, λ∗, µ∗ are primal-dual optimal (sufficiency).

A remarkable result of Eq. 2.8 is that it provides sufficient conditions for global optimality (and, thereby, strong duality), if the corresponding primal formulation is a CP, or local optimality, otherwise. However, this relies on the assumption that a KKT-compliant point exists, which may not always be the case; notably, a dual optimal point may not satisfy the complementary slackness conditions of Eq. 2.8. For such a primal-dual optimal point to exist, specific constraint qualification conditions must be satisfied. Among the many applicable constraint qualification conditions, the Slater’s conditions [15, §5], for example, require that: a) for all i ∈ I, the gi are convex, b) for all j ∈ E, the hj are affine, and c) there exists a point xˆ in the interior of D, such that all gi(ˆx) < 0 and all hj(ˆx) = 0. Slater’s conditions holding at a primal feasible optimum are sufficient to ensure the existence of an optimal Lagrangian multipliers, satisfying the KKT conditions in Eq. (2.8) and achieving local/global optimality.

We remark that, if the MP at hand is nonconvex, the KKT conditions are necessary for local optimality. However, information from the gradient may not be enough to ascertain in which directions the objective function increases/decreases. Therefore, in order to identify a point as optimal, second-order conditions are required, to analyze the curvature of the Lagrangian function in Eq. (2.5) in undecided directions. These conditions “examine the terms of the second derivative in the Taylor series expansions of f, gi and hj, to see if this extra information resolves the issue of increase or decrease of f”. See, [150, §12.4] for further details).

**Table of contents :**

**1 Introduction **

1.1 List of publications

**I Generalities **

**2 An overview of mathematical programming **

2.1 Introduction

2.2 The components of a mathematical program

2.3 Convex Programming

2.4 Lagrangian duality and optimality conditions

2.5 Linear Programming

2.6 Mixed-Integer Linear Programming

2.6.1 Common solution algorithms

2.7 Mixed-Integer Nonlinear Programming

2.7.1 Common solution algorithms

2.8 Some words about unconstrained optimization

2.8.1 Line search and trust-region methods

2.9 Solvers

2.9.1 MILP solvers

2.10 Conclusions

**3 On machine learning **

3.1 Introduction

3.2 An introduction to statistical learning theory

3.3 The training and inference problems

3.4 Risk and generalization

3.5 The path to improving generalization

3.5.1 Structural risk minimization

3.5.2 Regularized risk minimization

3.5.3 Cross-validation

3.5.4 The feature space

3.6 Some regression paradigms

3.6.1 Logistic Regression

3.6.2 Decision Trees

3.6.3 Neural Networks

3.6.4 Support Vector Regression

3.7 Conclusions

**II Algorithm configuration problem **

**4 The algorithm configuration problem **

4.1 Introduction

4.2 An algorithmic schema

4.3 The construction of M

4.3.1 The samplet phase

4.3.2 The updatet phase

4.4 Computing ΨM

4.5 Classifying algorithm configuration approaches

4.6 Per-problem approaches

4.6.1 Approaches based on evolutionary algorithms

4.6.2 Approaches based on local search heuristics

4.6.3 Approaches based on experimental design

4.7 Per-instance approaches

4.7.1 Approaches learning to predict the optimal configuration

4.7.2 Approaches learning to approximate algorithmic performances

4.7.3 Approaches building PA,|Π′|

4.7.4 Approaches building PA,C

4.7.5 Approaches building PA,1

4.8 Our take on the algorithm configuration problem

4.9 Conclusions

**5 Tuning algorithms: optimizing over learned predictions**

5.1 Introduction

5.2 Motivation

5.3 The Performance-as-Output approach to solver configuration

5.3.1 The construction of M

5.3.2 The recommender

5.4 Implementation with different Machine Learning paradigms

5.4.1 The approach with SVR

5.4.2 The approach with DTs

5.5 Experimental results

5.5.1 The Hydro Unit Commitment problem

5.5.2 Building the training set

5.5.3 Learning ¯pA: experimental setup and results

5.5.4 The CSSP: experimental setup and results

5.5.5 CSSP solution completion

5.5.6 Assessing the performance of our recommender

5.6 Conclusions

**6 Tuning algorithms: an alternative stance **

6.1 Introduction

6.2 Motivation

6.3 The construction of M

6.3.1 The PaO variant

6.3.2 The PaI variant

6.4 The recommender

6.4.1 The PaO variant

6.4.2 The PaI variant

6.5 Computational experiments

6.5.1 The algorithmic framework

6.5.2 Results

6.6 Conclusions

**I****II Graph embedding and Distance Geometry **

**7 Cycle-based formulations in Distance Geometry **

7.1 Introduction

7.2 Placing the DGP in the thesis

7.3 Generalities

7.4 Some existing MP formulations

7.5 A new formulation based on cycles

7.6 The cycle vector space and its bases

7.6.1 Constraints over cycle bases

7.6.2 How to find directed cycle bases

7.7 The Eulerian cycle relaxation

7.8 Computational experiments

7.8.1 Results

7.9 Conclusions

**8 Future developments **

8.1 The PaO approach with Neural Networks

8.2 The multi-phase PaO approach

**8.2.1 Implementation **

**9 Conclusions**