FOGA '21: Proceedings of the 16th ACM/SIGEVO Conference on Foundations of Genetic Algorithms

Full Citation in the ACM Digital Library

Asymptotic convergence rates for averaging strategies

Parallel black box optimization consists in estimating the optimum of a function using λ parallel evaluations of f. Averaging the μ best individuals among the λ evaluations is known to provide better estimates of the optimum of a function than just picking up the best. In continuous domains, this averaging is typically just based on (possibly weighted) arithmetic means. Previous theoretical results were based on quadratic objective functions. In this paper, we extend the results to a wide class of functions, containing three times continuously differentiable functions with unique optimum. We prove formal rate of convergences and show they are indeed better than pure random search asymptotically in λ. We validate our theoretical findings with experiments on some standard black box functions.

On crossing fitness valleys with majority-vote crossover and estimation-of-distribution algorithms

The benefits of using crossover in crossing fitness gaps have been studied extensively in evolutionary computation. Recent runtime results show that majority-vote crossover is particularly efficient at optimizing the well-known Jump benchmark function that includes a fitness gap next to the global optimum. Also estimation-of-distribution algorithms (EDAs), which use an implicit crossover, are much more efficient on Jump than typical mutation-based algorithms. However, the allowed gap size for polynomial runtimes with EDAs is at most logarithmic in the problem dimension n.

In this paper, we investigate variants of the Jump function where the gap is shifted and appears in the middle of the typical search trajectory. Such gaps can still be overcome efficiently in time O (n log n) by majority-vote crossover and an estimation-of-distribution algorithm, even for gap sizes almost [EQUATION]. However, if the global optimum is located in the gap instead of the usual all-ones string, majority-vote crossover would nevertheless approach the all-ones string and be highly inefficient. In sharp contrast, an EDA can still find such a shifted optimum efficiently. Thanks to a general property called fair sampling, the EDA will with high probability sample from almost every fitness level of the function, including levels in the gap, and sample the global optimum even though the overall search trajectory points towards the all-ones string. Finally, we derive limits on the gap size allowing efficient runtimes for the EDA.

Focused jump-and-repair constraint handling for fixed-parameter tractable graph problems

Repair operators are often used for constraint handling in constrained combinatorial optimization. We investigate the (1+1) EA equipped with a tailored jump-and-repair operation that can be used to probabilistically repair infeasible offspring in graph problems. Instead of evolving candidate solutions to the entire graph, we expand the genotype to allow the (1+1) EA to develop in parallel a feasible solution together with a growing subset of the instance (an induced subgraph). With this approach, we prove that the EA is able to probabilistically simulate an iterative compression process used in classical fixed-parameter algorithmics to obtain a randomized FPT performance guarantee on two NP-hard graph problems. For k-VertexCover, we prove that the (1+1) EA using focused jump-and-repair can find a k-cover (if one exists) in O(2k n2 log n) iterations in expectation. This leads to an exponential (in k) improvement over the best-known parameterized bound for evolutionary algorithms on VertexCover. For the k-FeedbackVertexSet problem in tournaments, we prove that the EA finds a feasible feedback set in O(2kk!n2 log n) iterations in expectation. This is the first parameterized result for an evolutionary algorithm on this problem. We discuss how to generalize the framework to other parameterized graph problems closed under induced subgraphs and report experimental results that illustrate the behavior of the algorithm on a concrete instance class.

Automatic adaptation of hypermutation rates for multimodal optimisation

Previous work has shown that in Artificial Immune Systems (AIS) the best static mutation rates to escape local optima with the ageing operator are far from the optimal ones to do so via large hyper-mutations and vice-versa. In this paper we propose an AIS that automatically adapts the mutation rate during the run to make good use of both operators. We perform rigorous time complexity analyses for standard multimodal benchmark functions with significant characteristics and prove that our proposed algorithm can learn to adapt the mutation rate appropriately such that both ageing and hypermutation are effective when they are most useful for escaping local optima. In particular, the algorithm provably adapts the mutation rate such that it is efficient for the problems where either operator has been proven to be effective in the literature.

Self-adjusting offspring population sizes outperform fixed parameters on the cliff function

In the discrete domain, self-adjusting parameters of evolutionary algorithms (EAs) has emerged as a fruitful research area with many runtime analyses showing that self-adjusting parameters can out-perform the best fixed parameters. Most existing runtime analyses focus on elitist EAs on simple problems, for which moderate performance gains were shown. Here we consider a much more challenging scenario: the multimodal function Cliff, defined as an example where a (1, λ) EA is effective, and for which the best known upper runtime bound for standard EAs is O(n25).

We prove that a (1, λ) EA self-adjusting the offspring population size λ using success-based rules optimises Cliff in O(n) expected generations and O(n log n) expected evaluations. Along the way, we prove tight upper and lower bounds on the runtime for fixed λ (up to a logarithmic factor) and identify the runtime for the best fixed λ as nη for η ≈ 3.9767 (up to sub-polynomial factors). Hence, the self-adjusting (1, λ) EA outperforms the best fixed parameter by a factor of at least n2.9767 (up to sub-polynomial factors).

Non-local optimization: imposing structure on optimization problems by relaxation

In stochastic optimization, particularly in evolutionary computation and reinforcement learning, the optimization of a function f : Ω → R is often addressed through optimizing a so-called relaxation θ ϵ Θ → Eθ(f) of f, where Θ resembles the parameters of a family of probability measures on Ω. We investigate the structure of such relaxations by means of measure theory and Fourier analysis, enabling us to shed light on the success of many associated stochastic optimization methods. The main structural traits we derive and that allow fast and reliable optimization of relaxations are the consistency of optimal values of f, Lipschitzness of gradients, and convexity. We emphasize settings where f itself is not differentiable or convex, e.g., in the presence of (stochastic) disturbance.

On the potential of normalized TSP features for automated algorithm selection

Classic automated algorithm selection (AS) for (combinatorial) optimization problems heavily relies on so-called instance features, i.e., numerical characteristics of the problem at hand ideally extracted with computationally low-demanding routines. For the traveling salesperson problem (TSP) a plethora of features have been suggested. Most of these features are, if at all, only normalized imprecisely raising the issue of feature values being strongly affected by the instance size. Such artifacts may have detrimental effects on algorithm selection models. We propose a normalization for two feature groups which stood out in multiple AS studies on the TSP: (a) features based on a minimum spanning tree (MST) and (b) a k-nearest neighbor graph (NNG) transformation of the input instance. To this end we theoretically derive minimum and maximum values for properties of MSTs and k-NNGs of Euclidean graphs. We analyze the differences in feature space between normalized versions of these features and their unnormalized counterparts. Our empirical investigations on various TSP benchmark sets point out that the feature scaling succeeds in eliminating the effect of the instance size. Eventually, a proof-of-concept AS-study shows promising results: models trained with normalized features tend to outperform those trained with the respective vanilla features.

Do additional optima speed up evolutionary algorithms?

Most runtime analyses of randomised search heuristics focus on the expected number of function evaluations to find a unique global optimum. We ask a fundamental question: if additional search points are declared optimal, or declared as desirable target points, do these additional optima speed up evolutionary algorithms? More formally, we analyse the expected hitting time of a target set OPT ∪ S where S is a set of non-optimal search points and OPT is the set of optima and compare it to the expected hitting time of OPT.

We show that the answer to our question depends on the number and placement of search points in S. For all black-box algorithms and all fitness functions we show that, if additional optima are placed randomly, even an exponential number of optima has a negligible effect on the expected optimisation time. Considering Hamming balls around all global optima gives an easier target for some algorithms and functions and can shift the phase transition with respect to offspring population sizes in the (1,λ) EA on One-Max. Finally, on functions where search trajectories typically join in a single search point, turning one search point into an optimum drastically reduces the expected optimisation time.

Computing diverse sets of high quality TSP tours by EAX-based evolutionary diversity optimisation

Evolutionary algorithms based on edge assembly crossover (EAX) constitute some of the best performing incomplete solvers for the well-known traveling salesperson problem (TSP). Often, it is desirable to compute not just a single solution for a given problem, but a diverse set of high quality solutions from which a decision maker can choose one for implementation. Currently, there are only a few approaches for computing a diverse solution set for the TSP. Furthermore, almost all of them assume that the optimal solution is known. In this paper, we introduce evolutionary diversity optimisation (EDO) approaches for the TSP that find a diverse set of tours when the optimal tour is known or unknown. We show how to adopt EAX to not only find a high-quality solution but also to maximise the diversity of the population. The resulting EAX-based EDO approach, termed EAX-EDO is capable of obtaining diverse high-quality tours when the optimal solution for the TSP is known or unknown. A comparison to existing approaches shows that they are clearly outperformed by EAX-EDO.

The effect of non-symmetric fitness: the analysis of crossover-based algorithms on RealJump functions

Theory of evolutionary computation has brought plenty of useful recommendations to the practitioners on how to deal with local optima. Many of these results were obtained through the runtime analysis of evolutionary algorithms (EAs for brevity) on Jump benchmark function, which has a local optimum which is very hard to leave for most EAs. The performed analyses resulted into multiple new algorithms which showed a good performance on Jump function, including the (1 + (λ, λ)) GA (with dynamic parameters choices or with non-standard static parameters), the (μ + 1) GA with various diversity mechanisms, and the hybrid GA.

However, some of the proposed algorithms exploit properties of Jump which seem not to be likely to belong to practical problems, e.g. the invariance of Jump to any change of the order of the arguments and the clear signal from the inferior individuals towards the optimum. To figure out which of the developed algorithms do not rely too much on these properties we perform their runtime analyses on the so-called RealJump function, which has its optimum slightly shifted from the center of the fitness valley. We show that moving the optimum increases the runtime of all considered algorithms, but the additional factor for the (μ + 1) GA is greater than the one for the (1 + (λ, λ)) GA. The hybrid GA fails to find the shifted optimum with high probability.