Extending the Lower Bounds to All Functions Having Not Excessively Many Global Optima

Get Complete Project Material File(s) Now! »

Drift Analysis for Processes with No Drift

In this section we consider stochastic processes with a very small drift compared to their random fluctuations. Such processes are often analyzed via different mathematical tools for martingales, e.g., with Azuma-Hoeffding inequality (Theorem 1.10.30 in [30]). However, all such tools aim at giving an upper bound on the probability that a martingale deviates from its expectation by some particular value. In this work we aim at the opposite result: we want to know how much time such process needs to deviate from its expectation by some value.
We propose to use drift analysis to estimate this time t. To do so, we make a transformation of the process. This idea has already been used to prove different drift theorems (e.g., variable or multiplicative) via the additive drift theorem. However, there it was used for the processes which already have drift, and the transformation only scaled the state space so that the drift was approximately the same in all possible states. In contrast with these previous works, we do a transformations which lets us obtain a drift from the variance of the original process. To illustrate the main idea of this transformation, consider a stochastic process {Xt}t∈N such that Xt+1 = ( 1 . Xt + 1 with probability 1 , 2 Xt − 1 with probabilit 2 This process has no drift, since E[Xt+1 | Xt] = 12 (Xt − 1) + 12 (Xt + 1) = Xt. However, consider a function h(x) = x(n − ln(x) + 1) and a transformed process Yt = h(Xt) (temporarily ignore that it is not defined for all Xt). Func-tion h(x) is monotonically increasing and concave in interval (0, en), therefore if Xt is in this interval, then by Jensen’s inequality we have E[h(Xt+1) | Xt] = 12h(Xt − 1) + 12h(Xt + 1) < h(Xt).

Analysis of the (µ + λ) EA

In this section we prove that for arbitrary values of µ and λ (which can be functions of the problem size n, however, for the lower bound we assume that µ is at most polynomial in n), the expected number of iterations the (µ + λ) EA takes to find the optimum of the ONEMAX function is E[T] = Θ n log n + n + n log+ log+(λ/µ) , λ λ/µ where log+ x := max{1, log x} for all x > 0. This result subsumes the previous results for the (1 + λ) EA and (µ + 1) EA obtained in [87, 49, 138].
This runtime guarantee shows, e.g., that using a true parent population of size at most max{log n, λ} does not reduce the asymptotic runtime compared to µ = 1. Such information can be useful since it is known that larger parent population sizes can increase the robustness to noise, see, e.g., [74]. With our methods, we can also analyze a related algorithm. He and Yao [80] and Chen, He, Sun, Chen, and Yao [13] analyzed the (λ Θ(n logλ n + n) iterations, which also shows that the the asymptotic runtime in this problem. 1:1 EA. We prove a tight runtime bound of + λ) fairness in the parent selection does not change

Upper Bounds

In this section, we prove separately two upper bounds for the runtime of the (µ + λ) EA on the ONEMAX problem, the first one being valid for all values of µ and λ and the second one giving an improvement for the case that λ is large compared to µ, more precisely, that λ/µ ≥ ee. Although we do not propose novel analysis methods in this subsection, it is important to have these upper bounds to show the tightness of our lower bound which we obtain via the complete trees technique and to better understand the dynamics of the population of the (µ + λ) EA.
Where not specified differently, we denote the current best fitness in the population by i and the number of best individuals in the population by j.

Increase of the Number of the Best Individuals

In this subsection we analyze how the number of individuals on the current-best fitness level increases over time and derive from this two estimates for the time taken for a fitness improvement. We note that often it is much easier to generate an additional individual with current-best fitness by copying an existing one than to generate an individual having strictly better fitness by flipping the right bits. Consequently, in a typical run of the (µ + λ) EA, first the number of best individuals will increase to a certain number and only then it becomes likely that a strict improvement happens. Since the increase of the number of individuals on the current-best fitness level via pro-ducing copies of such best individuals is independent of the fitness function, we formulate our results for the optimization of an arbitrary pseudo-Boolean function and hope that they might find applications in other runtime analyses as well. So let f : {0, 1}n → R be an arbitrary fitness function which we optimize using the (µ + λ) EA.
Assume that the (µ + λ) EA starts in an arbitrary state where the best individuals have fitness i and there are j1 such individuals in the population. At this point due to the elitism the algorithm cannot decrease the best fitness i and it also cannot decrease the number of the best individuals j1 until it increases the best fitness. Following [129, Lemma 2] we call an individual fit if it has a fitness i or better. For j2 ∈ N, we define τj1,j2 (i) to be the first time (number of iterations) at which the population of the (µ + λ) EA contains at least j2 fit individuals. We note that this random variable τj1,j2 (i) may depend on the particular initial state of the (µ + λ) EA, but since our results are independent of this initial state (apart from i and j1) we suppress in our notation the initial state. The time τ1,µ(i), that is, the specific case that j 1 = 1 and j2 = µ, is also called the takeover time of a new best individual. For this takeover time, Sudholt [129, Lemma 2] proved the upper bound

READ THE CONSTRUCTS OF PSYCHOLOGY, PERSONALITY AND PERSO-NALITY TRAITS

Table of contents :

Introduction
Chapter 1. Introduction
1.1 Notation
1.2 Considered EAs and GAs
1.2.1 The (μ + λ) EA
1.2.2 The (μ, λ) EA
1.2.3 The (1 + (λ, λ)) GA
1.3 Mutation and Crossover Operators
1.3.1 Unbiased Operators
1.3.2 Fast Mutation Operator and Power-law Distribution
1.4 Model Functions
1.4.1 Black-box Complexity
1.4.2 ONEMAX
1.4.3 LEADINGONES
1.4.4 Jump Functions
1.4.5 Plateau Functions
1.5 State of the art
1.5.1 Results for the (μ + λ) EA on ONEMAX
1.5.2 Results for the (μ, λ) EA on ONEMAX
1.5.3 Results for the (1 + (λ, λ)) GA on ONEMAX
1.5.4 Results for the LEADINGONES
1.5.5 Results for Plateaus
1.5.6 Results for Jump Functions
1.5.7 Summary for the Current State of the Art
1.6 Existing Tools
1.6.1 Markov Chains
1.6.2 Drift Analysis
1.6.3 Fitness Levels
1.6.4 Family Trees
1.6.5 Summary of the Existing Methods and Motivation of This Work
1.7 Contribution of This Work
1.8 Useful Tools
Chapter 2. Analysis of Mutation-based Algorithms
2.1 Proposed Analysis Methods
2.1.1 Complete Trees
2.1.2 Drift Analysis for Processes with No Drift
2.2 Analysis of the (μ + λ) EA
2.2.1 Upper Bounds
2.2.2 Lower Bounds
2.2.3 Extending the Lower Bounds to All Functions Having Not Excessively Many Global Optima
2.2.4 Analysis of the (λ 1:1 + λ) EA
2.3 Analysis of the (μ, λ) EA
2.3.1 Lower Bounds for λ (1 􀀀 ε)μe with ε = ω √1n
2.3.2 The Runtime when λ μe
2.3.3 Polynomial Runtime on the Threshold for the Large Population Sizes .
2.4 Conclusion of Chapter 2
Chapter 3. Analysis of EAs on Plateaus
3.1 Tools from Linear Algebra
3.2 Proposed Analysis Method for Plateaus
3.2.1 Two Markov Chains
3.2.2 The Spectrum of the Transient Matrix
3.3 Runtime Analysis
3.4 Corollaries
3.4.1 Randomized Local Search and Variants
3.4.2 Standard (1 + 1) EA
3.4.3 Fast (1 + 1) EA
3.4.4 Hyper-Heuristics
3.4.5 Comparison for Concrete Values
3.5 Parallel Runs and Population-Based Algorithms
3.6 Conclusion of Chapter 3
Chapter 4. Analysis of Crossover-based Algorithms
4.1 Heavy-tailed (1 + (λ, λ)) GA
4.2 Runtime Analysis of the Heavy-Tailed (1 + (λ, λ)) GA on ONEMAX
4.2.1 Runtime Analysis
4.2.2 Experiments
4.3 The Runtime Analysis of the (1 + (λ, λ)) GA on LEADINGONES
4.3.1 Additive Drift with Tail Bounds
4.3.2 Lower Bound
4.3.3 Lower Bound for Adaptive λ
4.3.4 Upper Bound
4.3.5 Upper Bound for Adaptive λ
4.4 The Runtime Analysis of the (1 + (λ, λ)) GA with on JUMP
4.4.1 Upper Bounds
4.4.2 Lower Bound
4.5 The Runtime Analysis of the Heavy-Tailed (1 + (λ, λ)) GA JUMP
4.5.1 The Heavy-Tailed (1 + (λ, λ)) GA with Multiple Parameter Choices
4.5.2 Runtime Analysis
4.6 Conclusion of Chapter 4
Conclusion
References