Get Complete Project Material File(s) Now! »

**CHAPTER 4 Theory of gradient-only optimization**

In this chapter we consider some theoretical aspects of gradient-only optimization for the unconstrained optimization of objective functions containing non-physical step or jump discontinuities. The (discontinuous) gradients are however assumed to be accurate and everywhere uniquely de ned. This kind of discontinuity in-deed arises when the optimization problem is based on the solutions of systems of partial di erential equations, when variable discretization techniques are used (remeshing in spatial domains or variable time stepping in temporal domains). These discontinuities, which may cause local minima, are artifacts of the nu-merical strategies used and should not in uence the solution to the optimization problem. We demonstrate that it is indeed possible to ignore these local minima due to discontinuities, if only gradient information is used. Various gradient-only algorithmic options are discussed. The implications are that variable discretization strategies, so important in the numerical solution of partial di erential equations, can be combined with e cient local optimization algorithms.

This chapter is organized as follows: We give an introduction to discontinuous objective functions in Section 4.1. We then de ne an optimization problem and solution to the problem that is solely based on the gradient of a function in Sec-tion 4.2. In Section 4.3, we introduce the gradient-only optimization problem and in Section 4.4 we o er proofs of convergence of descent sequences de ned in the previous section. We give practical considerations regarding gradient-only optimization algorithms in Section 4.5, and present a brief comparative discussion of classical mathematical programming and gradient-only optimization in Sec-tion 4.6. In Section 4.7 we present a shape optimization problem of practical importance, and a number of analytical test functions. Concluding remarks then follow.

**Introduction**

In this study we consider theoretical aspects regarding gradient-only approaches to avoid spurious (local) minima for unconstrained optimization. Here, gradient-only optimization algorithms refers to optimization strategies that solely considers rst order information of a scalar (cost) objective function in computing update directions and update step lengths.

Many problems in engineering and the applied sciences are described by (partial) di erential equations (P)DEs, e.g. Newton’s second law, Poisson’s equations, Maxwell’s electromagnetic equations, the Black-Scholes equations, the Lotka-Volterra equations and Einstein’s eld equations. Analytical solutions to these are seldom available and in many cases, (approximate) numerical solutions need to be computed. These (approximate) numerical solutions are often obtained by employing discretization methods e.g. nite di erence, nite element and nite volume methods.

(P)DEs also often describe the physics of some optimization problem. These opti-mization problems are usually numerically approximated that results in an approximate optimization problem, which are then optimized using numerical optimization techniques. During optimization the domain over which the (P)DEs are solved may remain constant but the discretization may be required to change to ensure convergence or e ciency of the solution e.g. integrating over a xed time domain using variable time steps. Alterna-tively, the design variables may describe the domain over which the (P)DEs are solved. A change in design variables therefore change the solution domain, which in turn requires the discretization to change e.g. shape optimization

In order to a ect these discretization changes, we distinguish between two classes of strategies. First, constant discretization strategies continuously adjusts a reference dis-cretization when the solution domain change (and hence generates a xed discretization if the solution domain remains xed). Secondly, variable discretization strategies gener-ate new independent discretizations irrespective of whether or not the solution domain changes. For example, temporal (P)DEs may be solved using xed or variable time steps. For spatial PDEs, the equivalents are xed and mesh movement strategies on the one hand, and remeshing on the other. Fixed time steps and mesh movement strategies how-ever may imply serious di culties, e.g. impaired convergence rates and highly distorted grids and meshes, which may even result in failure of the computational procedures used. The variable discretization strategies are preferable by far.

One consequence of using variable discretization while solving an optimization prob-lem, is that the resulting objective functions contain discontinuities.

Accordingly, the optimization of piece-wise smooth step discontinuous functions usu-ally requires highly specialized optimization strategies, and possibly, heuristic approaches. In contrast, constant discretization strategies result in smooth, continuous objective func-tions that present no signi cant challenges to optimization algorithms.

It therefore appears that the analyst has two options, namely i) combining an e cient local optimization algorithms with non-ideal constant discretization, or ii) combining a less e cient global optimization algorithm with ideal non-constant discretization.

Hence, variable time step methods and remeshing techniques are normally avoided in optimization, due to the very fact that the required global optimization algorithms are prohibitively expensive. An important spatial example is structural shape optimization, in which xed or mesh movement strategies are almost always used; the very motivation for this being that remeshing strategies cannot be used e ciently, due to the induced non-physical local minima during optimization, e.g. see References [1, 9, 32, 39].

To avoid confusion, we emphasize the di erence between physical discontinuities that occur in the solution domain of PDEs, such as shear banding in plasticity and shock waves in supersonic ow, and non-physical discontinuities. The non-physical discontinuities we refer to occur in the objective function of an optimization problem as opposed to discontinuities in the solution of a PDE.

As we aim to develop a theoretical framework for gradient-only optimization in this chapter we will invariably restrict ourselves to various classes of objective functions in our discussions and analysis through the course of this chapter. In addition, we re-strict ourselves to unconstrained optimization and note that some practical constrained optimization problems can be successfully reformulated as unconstrained optimization problems using a penalty formulation. Consider the following unconstrained minimiza-tion problem: nd the minimizer x of a real-valued function f : X R^{n} ! R, such that f(x ) f(x); 8 x 2 X; with X the convex set of all possible solutions. If the function f is strictly convex, coer-cive and at least twice continuously di erentiable, i.e. f 2 C^{2}, the minimizer x 2 X is characterized by the optimality criterion rf(x ) = 0, with the Hessian matrix H(x ) positive semi-de nite at x . Here, r represents the gradient operator. For this program-ming problem, many well known minimization algorithms are available, e.g. steepest descent, (preconditioned) conjugate gradient methods and quasi-Newton methods like BFGS. However, if f is discontinuous, i.e. f 2= C^{0}, the minimizer x of the mathematical programming problem (4.1) may not satisfy the optimality criterion given above. Indeed, the optimality criterion may not even be de ned in the classical sense although it may be de ned using generalized gradients (subdi erential) @f(x) [12, 13, 17] which requires 2 @f(x ).

To the best of our knowledge, only a few gradient-only optimization algorithms have been developed, see References [4, 43, 44, 49, 53, 54, 64]. This includes the widely known gradient-only Newton’s method [64], which locates diminishing gradients.. A notable contribution on the optimization of functions that are not everywhere uniquely di erentiable (non-di erentiable) is the subgradient methods [49]. Subgradient methods reduce to steepest descent algorithms with step lengths a priori determined i.e. the line searches do not depend on any computed information during optimization, when an objective function is continuously di erentiable. All of these algorithms are used to minimize objective functions and require the condition that the gradient rf(x ) = 0 or the subdi erential^{1} @f(x ) must contain 0 depending on the di erentiability of f(x) at x . Accordingly, the well-known e cient optimization algorithms mentioned above may be unable to nd x_{g} since they are concerned with obtaining a minimizer x for an objective function [17].

In turn, if it is known or assumed that x_{g} coincides with a minimizer x of f(x) then these problems can be approached from a classical mathematical programming perspec-tive which have to be solved using global optimization algorithms. This is due to the numerically induced step discontinuities that manifest as local minima in the function domain. We however show that if it is known that accurate associated gradient infor-mation that is everywhere de ned is available, the resulting discontinuous problems may still be optimized e ciently since gradient-only optimization ignores these numerically induced step discontinuities. Gradient-only optimization therefore transforms a problem plagued with numerically induced local minima to a problem free from it. Recall that a third option now becomes available to the analyst: combine an e cient local optimization algorithm with ideal non-constant discretization.

Let us rst present two illustrative examples of non-physical step discontinuities, to set the tone for this chapter. The rst is rather trivial, the second not quite.

** Univariate example problem: Newton’s cooling law**

Consider Newton’s law of cooling, which states that the rate of heat loss of a body is proportional to the di erence in temperature between a body and the surroundings of that body, given by the linear rst order DE:T (t) = T_{env} + (T_{init}

Here is a positive proportionality constant, T (t) the temperature of the body at time t and T_{env} the temperature of the surroundings of the body.

Synopsis

Acknowledgements

1 Overview

2 Remeshing shape optimization strategy

2.1 Introduction

2.2 Mesh generation

2.3 Problem formulation

2.4 Numerical examples

2.5 Conclusion

3 Applications of gradient-only optimization

3.1 Introduction

3.2 Denitions

3.3 Problem formulation

3.4 Optimization algorithms

3.5 Example problems

3.6 Conclusions

4 Theory of gradient-only optimization

4.1 Introduction

4.2 Denitions

4.3 Gradient-only optimization problem

4.4 Proofs of convergence for derivative descent sequences

4.5 Practical algorithmic considerations

4.6 Mathematical programming vs. gradient-only optimization

4.7 Numerical study

4.8 Conclusions

5 Adaptive remeshing in shape optimization

5.1 Introduction

5.2 Shape optimization problem

5.3 Optimization algorithm

5.4 Structural analysis

5.5 Adaptive mesh generator

5.6 Sensitivity analysis .

5.7 Numerical study.

5.8 Conclusions

6 Conclusion and recommendations

GET THE COMPLETE PROJECT

Approaches to accommodate remeshing in shape optimization