The story goes that while working at Bell Labs in the 1950s, the mathematician and computer scientist Richard Hamming would ask colleagues, "what's the most important problem in your field?" ... and then follow up with, "so, why aren't you working on it?" Both questions have many possible answers, even for just one person at one time, but they are certainly provocative, tough and uncomfortable. In the talk, I will reflect on my personal answers at various times, some answers for evolutionary computation (EC) and evolutionary multiobjective optimization (EMO) more broadly, and for adjacent fields to EC/EMO as well as for industrial research & innovation. My particular answers (or anyone's) are almost certainly not as important as the effort behind them to grapple with the questions.
Drift analysis is a great tool for proving that optimization algorithms work the way we think they do, and for analyzing them, potentially in great detail. In this talk, I will discuss drift analysis for evolution strategies. These algorithms exhibit linear convergence on a wide range of problems, which corresponds to a linear decrease of the logarithmic distance of the best-so-far sample from the optimum, giving rise to simple additive drift. That behavior is enabled by online adaptation of the step size, which decays at the same rate as the distance to the optimum. Moreover, modern evolution strategies like CMA-ES adapt not only the step size, but rather the full covariance matrix of their sampling distribution. The mechanism enables convergence at a problem-independent rate that depends only on the dimension of the search space. The primary challenge of proving the convergence of CMA-ES lies in establishing the stability of the adaptation process, which was recently achieved by analyzing the invariant Markov chain that describes the parameter adaptation process. Yet, a drift-based analysis is still desirable because it can yield much more fine-grained results. For instance, it can provide details about the transient adaptation phase, which often takes up the lion's share of the time for solving the problem. To achieve this, we need a potential function that appropriately penalizes unsuitable parameter configurations, or more precisely, configurations the algorithm tends to move away from. Designing a potential function that captures the dynamics of covariance matrix adaptation is an ongoing challenge. I will present our recent research efforts towards this goal and emphasize why relatively simple additive drift offers a powerful framework for achieving it.
Evolutionary algorithms are inherently parallel optimization methods that have been successfully applied to many critical systems. While our understanding of their performance has steadily progressed, most existing work on distributed evolution assumes a perfect runtime environment. This assumption does not always hold in practice, and it is important to consider how robust such systems are to attack.
We consider distributed evolutionary algorithms in a setting where a malicious adversary can confound the search process by corrupting the communication of candidate solutions in the distributed architecture. We study this setting for parallel evolutionary algorithms optimizing pseudo-Boolean functions under different adversarial models. We prove asymptotic bounds in various adversarial models that relate the number and budget of the adversary to success probability, solution quality and runtime on the OneMax problem. For a (1+λ) EA running in a distributed master/worker architecture, we prove upper bounds on the slow-down incurred by adversarial attacks as a function of population size, number of compromised nodes and adversarial budget. For a (1, λ) EA, we prove a negative result that reveals a parameter regime of the adversary that prevents efficient optimization.
We also conduct experiments on OneMax and the combinatorial problem MaxCut. On the latter, we empirically find that (1, λ) EA is more robust to adversarial corruption and, surprisingly, may even be able to leverage weak levels of corruption to outperform the adversary-free setting.
The Compact Genetic Algorithm (cGA) is an estimation-of-distribution algorithm that has been receiving much attention especially in the runtime analysis community in recent years. It comes with a single parameter K determining its strength of updates. Contrary to other estimation-of-distribution algorithms like the UMDA, the standard cGA does not have a parameter controlling its selection pressure.
Already the original authors of the cGA (Harik et al. 1999) suggested a generalization exhibiting larger selection pressure by sampling more than the usual number of two individuals per iteration. We analyze such a generalization following closely the original idea, but simplifying it by sampling λ individuals per iteration and updating the probabilistic model according to the difference of the fittest and the least fit sample. The resulting algorithm is analyzed primarily on the well-established OneMax benchmark, and a tight bound of Θ(K √n/log λ) for the number of iterations to find the optimum is obtained, with the upper bound being O(n log n/log λ) for a suitable K. The lower bound holds for all functions with a unique optimum. To prove our results, we apply strong probabilistic tools, including Slud's inequality, which has not been used in the runtime analysis of evolutionary computation before.
Since the speedup in the number of iterations is limited (in the number of function evaluations, we observe no speedup), we also present an example where the higher selection pressure is essential to find the global optimum efficiently. We study this function mostly empirically, and we also supplement further experimental insights on OneMax.
Academic research in dynamic optimisation uses benchmark generators to artificially simulate controlled and reproducible changing-environments to systematically compare algorithmic performance under uncertainty. However, due to the scarcity or difficulty in acquiring real-world data, benchmarks often fail to incorporate real-world features, such as problem constraints or the time-linkage property, where previously made decisions influence future events. This study introduces a Gaussian Copula-based real-world data-driven synthetic data generation model for Dynamic Truck and Trailer Scheduling Problem (DTTSP). The model offers a realistic, privacy-preserving DTTSP benchmark instance generator, which can be used to recreate the dynamism, constraints, heterogeneity, and time-linkage of logistics and supply chain operations. This work examines the utility, fidelity, and privacy of the suggested model in four workday case studies from a local transportation company. The conducted experiments demonstrate the systematical application of Gaussian Copulas to produce accurate, useful, and secure DTTSP benchmark instances that capture the statistical properties and correlation of variables, as well as the temporal patterns, in the original annual data. Nevertheless, the utility analysis of the conditional sampling indicates that there is still room for improvement in the modelling process.
Parameter control and dynamic algorithm configuration study how to dynamically choose suitable configurations of a parametrized algorithm during the optimization process. Despite being an intensively researched topic in evolutionary computation, optimal control policies are known only for very few cases, limiting the development of automated approaches to achieve them.
With this work we propose four new benchmarks for which we derive optimal or close-to-optimal control policies. More precisely, we consider the optimization of the LeadingOnes function via RLSk, a local search algorithm allowing for a dynamic choice of the mutation strength k. The benchmarks differ in which information the algorithm can exploit to set its parameters and to select offspring. In existing running time results, the exploitable information is typically limited to the quality of the current-best solution. In this work, we consider how additional information about the current state of the algorithm can help to make better choices of parameters, and how these choices affect the performance. Namely, we allow the algorithm to use information about the current OneMax value, and we find that it allows much better parameter choices, especially in marginal states. Although those states are rarely visited by the algorithm, such policies yield a notable speed-up in terms of expected runtime. This makes the proposed benchmarks a challenging, but promising testing ground for analysis of parameter control methods in rich state spaces and of their ability to find optimal policies by catching the performance improvements yielded by correct parameter choices.
Clustering is the process of grouping similar data points into distinct clusters. Graph theoretic techniques for data clustering attempt to find the most efficient way to convert a precomputed similarity graph into a cluster graph, which is a collection of disjoint cliques. In contrast, most existing evolutionary approaches to clustering are based on partitioning data points according to feature vectors. In this paper we present an evolutionary algorithm for tackling data clustering from the graph theoretic perspective. In particular, we adapt a genetic algorithm (the SubPopGA) that maintains a population of solved subgraphs to solve the NP-hard k-Cluster Deletion and k-Cluster Vertex Deletion problems. This adaptation is nontrivial, as it requires modifying the technique to work on induced subgraphs and designing a novel template parent mechanism for uniform crossover. We prove that the resulting SubPopGA has a fixed-parameter tractable running time on both problems. Given an arbitrary graph on n vertices and m edges, we prove that it solves k-Cluster Deletion in O(n34k + mn2 log n) generations in expectation, and the k-Cluster Vertex Deletion problem in O(n35k + n3 log n) in expectation. We also present results from a number of computational experiments that measure the running time of the SubPopGA on real-world biological data sets coming from protein-protein interaction networks.
In this work, we examine Monte Carlo tree search (MCTS) as an adaptive benchmark for competitive coevolution, both as a method for characterizing individual solutions, and as a measure of global progress throughout a coevolutionary run. We find that MCTS provides a much more practical measurement of solution quality than existing methods of comparing to randomly-sampled solutions. Additionally, we introduce a class of n-player games for which MCTS can be shown to have comparable performance for different n, allowing for the characterization of how coevolution performs in games with different numbers of players. We demonstrate these techniques for n-player Nim and an n-player variant of Othello.
Over decades of Traveling Salesperson Problem (TSP) research, powerful heuristics have been developed that efficiently solve many TSP instances. Among them, the local search optimizer LKH and the genetic algorithm EAX stand out as the two complementary state-of-the-art solvers. Yet, the links between instance structures and solver complementarity remain obscure, i.e., it is often unclear how instance structures affect solver performance and behavior.
While visualization techniques have advanced our understanding of TSP instance structures, there is still no general method to assess the behavior of TSP heuristics directly based on the structural characteristics of the instance to be solved. Existing approaches, such as search trajectory networks, are often optimized for graph-theoretic properties that may not reflect true similarities between tours. Consequently, visually comparing solvers remains difficult, especially when few identical solutions are encountered.
This paper introduces a framework for visualizing the search behavior of TSP heuristics, aiming to illuminate the still poorly understood key differences in their search behavior. To this end, we adapt the dimensionality reduction technique PaCMAP to map intermediate solutions into two-dimensional space, preserving distances that meaningfully reflect tour similarity. To support exploration, we provide an interactive dashboard that allows users to inspect individual tours and visually compare selected tour pairs. The resulting visualizations reveal fundamental differences in how LKH and EAX explore the search space, including their initialization strategies and trajectory structures. We further show that a recent EAX-inspired and performance-improved LKH variant still behaves similarly to LKH, highlighting the untapped potential for further algorithmic improvements.
Coevolutionary algorithms are a family of black-box optimisation algorithms with many applications in game theory. We study a coevolutionary algorithm on an important class of games in game theory: potential games. In these games, a real-valued function defined over the entire strategy space encapsulates the strategic choices of all players collectively. We present the first theoretical analysis of a coevolutionary algorithm on potential games, showing a runtime guarantee that holds for all exact potential games, some weighted and ordinal potential games, and certain non-potential games. Using this result, we show a polynomial runtime on singleton congestion games. Furthermore, we show that there exist games for which coevolutionary algorithms find Nash equilibria exponentially faster than best or better response dynamics, and games for which coevolutionary algorithms find better Nash equilibria as well. Finally, we conduct experimental evaluations showing that our algorithm can outperform widely used algorithms, such as better response on random instances of singleton congestion games, as well as fictitious play, counterfactual regret minimisation (CFR), and external sampling CFR on dynamic routing games.
The performance of surrogate model assisted algorithms for black-box optimization is impacted by two factors: algorithmic design choices on how to use the surrogate model, and the ability of the model to accurately represent the true objective function. In an effort to better understand the potential of surrogate model assisted algorithms, we propose to decouple those factors by studying the performance of the algorithms assuming perfect models. As a result, we obtain a natural performance bound that algorithms that use real models can be compared against, and that also provides an indication of the goodness of those models. We employ that performance bound in order to analytically evaluate a surrogate model assisted (1 + 1)-ES. Using the same bound, we also investigate the potential of performing incremental updates of Gaussian process surrogate models in an attempt to reduce algorithm internal computational costs and find that significant savings can be achieved at the cost of a small deterioration of model accuracy.
In this paper, we discuss the optimal distributions of solutions for maximizing the minimum crowding distance on linear Pareto fronts of two- and three-objective problems. Our discussions are twofold. One is theoretical discussions to analytically derive the optimal distributions. The other is empirical discussions, which are based on a newly implemented indicator-based algorithm to maximize the minimum crowding distance. First, we show the upper bound on the minimum crowding distance, which is derived from the upper bound on the total crowding distance. Next, we theoretically derive the optimal distributions for maximizing the minimum crowding distance on two-objective linear Pareto fronts. Our analysis shows that two solutions always overlap in the optimal distributions when the population size is an even number. For an odd number population size, the optimal distribution cannot be uniquely specified except for some anchor solutions (i.e., many different distributions are optimal). The theoretically obtained optimal distributions are compared with experimental results by our indicator-based algorithm. Then, we show an example of optimal distributions for a three-objective linear Pareto front where the population size is a multiple of three. Our indicator-based algorithm is also compared with the standard NSGA-II algorithm and its steady-state variant.
In most multi-objective test problems, the dimensionality of the Pareto set is the same as the dimensionality of the Pareto front. That is, an m-objective test problem with n decision variables usually has an (m-1)-dimensional Pareto front and an (m-1)-dimensional Pareto set. Thanks to this property, the mapping between the Pareto front and the Pareto set is usually a one-to-one mapping. In this paper, we formulate two-objective distance minimization problems in an n-dimensional decision space using Manhattan distance. The two objectives are defined by two target points in the decision space. That is, each objective is to minimize the Manhattan distance from the solution to each target point. The main characteristic feature of our test problems is that the dimensionality of the Pareto set can be arbitrarily specified between 1 and n by the locations of the two target points. For example, the Pareto set dimensionality in a 20-dimensional decision space is an arbitrarily specified integer between 1 and 20. These test problems have multimodality since many points in the Pareto set are mapped to the same single point on the Pareto front. In this paper, we first explain the relation between the locations of the two target points and the Pareto set dimensionality. Then, through computational experiments, we show that our simple test problems pose difficult challenges for both multi-objective algorithms and multi-modal multi-objective algorithms. We also discuss the scalability of the proposed test problems with respect to the number of equivalent Pareto sets and the number of objectives.
We present a novel method for constructing a close-to-optimal potential function for drift analysis of evolution strategies and other numerical optimization algorithms. It is based on a piecewise linear ansatz for the potential function, defined by potential values on a discrete grid of states together with a corresponding interpolation scheme. We compute or approximate the algorithm dynamics on the grid and then construct a linear program from expected progress and (weighted) state transitions. The solution of the optimization problem gives rise to parameters of the potential function maximizing (additive) drift. Under mild assumptions, the proceeding yields an arbitrarily close approximation of the optimal potential function at the expense of growing computational demands. As a proof of concept we apply the method to the (1+1) evolution strategy on the two-dimensional sphere function.
Populations play a key role in the area of evolutionary computation to tackle complex optimization problems. Nevertheless, it is hard to understand the underlying population dynamics from a theoretical perspective, and only a limited number of theoretical results for population-based algorithms are available even for simple benchmark functions. In this paper, we study the classic (μ+1) EA on the benchmark problem BinVal, which allows for exponentially many function values. Previous methods for the analysis, based on fitness levels and multiplicative drift analysis, lead to runtime bounds for this function of size n that include an additive term of Θ(n2). We provide new insights into how this standard algorithm optimizes BinVal, and we provide runtime bounds that are polynomial in the population size μ and do not include this additive term. In particular, we prove bounds on the expected runtime that are O(μ5n log (n/μ4)) for standard bit mutation, which is O(n log n) for constant μ. Our analysis considers the population dynamics of the (μ+1) EA more closely, proving that copies created by mutation lead to a low diversity in short blocks of bits across all individuals. We extend this method to mutation operators that cannot create duplicates, and prove bounds similar to standard bit mutation.
Crossover is a powerful mechanism for generating new solutions from a given population of solutions. Crossover comes with a discrepancy in itself: on the one hand, crossover usually works best if there is enough diversity in the population; on the other hand, exploiting the benefits of crossover reduces diversity. This antagonism often makes crossover reduce its own effectiveness.
We introduce a new paradigm for utilizing crossover that reduces this antagonism, which we call diversity-preserving exploitation of crossover (DiPEC). The resulting Diversity Exploitation Genetic Algorithm (DEGA) is able to still exploit the benefits of crossover, but preserves a much higher diversity than conventional approaches.
We demonstrate the benefits by proving that the (2 + 1)-DEGA finds the optimum of LeadingOnes with O(n5/3 log2/3 n) fitness evaluations. This is remarkable since standard genetic algorithms need Θ(n2) evaluations, and among genetic algorithms only some artificial and specifically tailored algorithms were known to break this runtime barrier. We confirm the theoretical results by simulations. Finally, we show that the approach is not overfitted to LeadingOnes by testing it empirically on other benchmarks and showing that it is also competitive in other settings. We believe that our findings justify further systematic investigations of the DiPEC paradigm.
Recently, there has been growing interest within the theoretical community in analytically studying multi-objective evolutionary algorithms. This runtime analysis-focused research can help formally understand algorithm behaviour, explain empirical observations, and provide theoretical insights to support algorithm development and exploration. However, the test problems commonly used in the theoretical analysis are predominantly limited to problems with heavy "artificial" characteristics (e.g., symmetric, homogeneous objectives and linear Pareto fronts), which may not be able to well represent realistic scenarios. In this paper, we first discuss commonly used multi-objective functions in the theory domain and systematically review their features, limitations and implications to practical use. Then, we present several new functions with more realistic features, such as heterogeneous objectives, local optimality and nonlinearity of the Pareto front, through simply mixing and matching classical single-objective functions in the area (e.g., LeadingOnes, Jump and RoyalRoad). We hope these functions can enrich the existing test problem suites, and strengthen the connection between theoretic and practical research.
This article proposes a novel constructive heuristic approach, Hyper-GRASP, that integrates the hypervolume indicator and GRASP principles to solve multiobjective combinatorial optimization problems. The approach constructs solutions iteratively by generating and evaluating candidate extensions of the current partial solution, guided by an optimistic bound on the hypervolume contribution of the future complete solution. The most promising extension is selected from a pool according to a parameter α, which balances greediness and exploration during the solution construction phase. Throughout the procedure, only feasible and nondominated solutions are retained. The proposed approach is tested on the multiobjective knapsack problem and the biobjective minimum spanning tree problem to assess its performance. The results show that the best-performing Hyper-GRASP variant effectively approximates the Pareto front across various instances. Moreover, it can also outperform state-of-the-art exact algorithms for the multiobjective knapsack problem as the problem size increases and across various time budgets.
It is well known that evolutionary algorithms can benefit from dynamic choices of the key parameters that control their behavior, to adjust their search strategy to the different stages of the optimization process. A prominent example where dynamic parameter choices have shown a provable super-constant speed-up is the (1 + (λ, λ)) Genetic Algorithm optimizing the OneMax function. While optimal parameter control policies result in linear expected running times, this is not possible with static parameter choices. This result has spurred a lot of interest in parameter control policies. However, many works, in particular theoretical running time analyses, focus on controlling one single parameter. Deriving policies for controlling multiple parameters remains very challenging. In this work, we reconsider the problem of the (1 + (λ, λ)) Genetic Algorithm optimizing OneMax. We decouple its four main parameters and investigate how well state-of-the-art deep reinforcement learning techniques can approximate good control policies. We show that although making deep reinforcement learning learn effectively is a challenging task, once it works, it is very powerful and is able to find policies that outperform all previously known control policies on the same benchmark. Based on the results found through reinforcement learning, we derive a simple control policy that consistently outperforms the default theory-recommended setting by 27% and the irace-tuned policy, the strongest existing control policy on this benchmark, by 13%, for all tested problem sizes up to 40,000.
This paper presents a first mathematical runtime analysis of PAES-25, an enhanced version of the original Pareto Archived Evolution Strategy (PAES) coming from the study of telecommunication problems over two decades ago to understand the dynamics of local search of MOEAs on many-objective fitness landscapes. We derive tight expected runtime bounds of PAES-25 with one-bit mutation on m-LeadingOnesTrailingZeroes until the entire Pareto front is found: Θ(n3) iterations if m = 2, Θ(n3 log2(n)) iterations if m = 4 and Θ(n · (2n/m)m/2 log(n/m)) iterations if m > 4 where n is the problem size and m the number of objectives. To the best of our knowledge, these are the first known tight runtime bounds for an MOEA outperforming the best known upper bound of O(nm+1) for (G)SEMO on m-LOTZ when m ≥ 4. We also show that archivers, such as the Adaptive Grid Archiver (AGA), Hypervolume Archiver (HVA) or Multi-Level Grid Archiver (MGA), help to distribute the set of solutions across the Pareto front of m-LOTZ efficiently. We also show that PAES-25 with standard bit mutation optimizes the bi-objective LeadingOnesTrailingZeroes benchmark in expected O(n4) iterations, and we discuss its limitations on other benchmarks such as OneMinMax or CountingOnesCountingZeros.
Bayesian optimisation (BO) is a surrogate-based optimisation technique that efficiently solves expensive black-box functions with small evaluation budgets. Recent studies consider trust regions to improve the scalability of BO approaches when the problem space scales to more dimensions. Motivated by this research, we explore the effectiveness of trust region-based BO algorithms for diversity optimisation in different dimensional black box problems. We propose diversity optimisation approaches extending TuRBO1, which is the first BO method that uses a trust region-based approach for scalability. We extend TuRBO1 as divTuRBO1, which finds an optimal solution while maintaining a given distance threshold relative to a reference solution set. We propose two approaches to find diverse solutions for black-box functions by combining divTuRBO1 runs in a sequential and an interleaving fashion. We conduct experimental investigations on the proposed algorithms and compare their performance with that of the baseline method, ROBOT (rank-ordered Bayesian optimisation with trust regions). We evaluate proposed algorithms on benchmark functions with dimensions 2 to 20. Experimental investigations demonstrate that the proposed methods perform well, particularly in larger dimensions, even with a limited evaluation budget.
Statistical Linkage Learning (SLL) is a part of many state-of-the-art optimizers. The purpose of SLL is to discover variable interdepen-dencies. It has been shown that the effectiveness of SLL-using optimizers is highly dependent on the quality of SLL-based problem decomposition. Thus, understanding what kind of problems are hard or easy to decompose by SLL is important for practice. In this work, we analytically estimate the size of a population sufficient for obtaining a perfect decomposition in case of concatenations of certain unitation-based functions. The experimental study confirms the accuracy of the proposed estimate. Finally, we identify those problem types that may be considered hard for SLL-using optimizers.
Many state-of-the-art Genetic Algorithms (GAs) use information about variable dependencies to construct masks for variation operators and, in turn, improve their effectiveness and efficiency. In the black-box setting, the dependency structure model is not known and must be discovered as a part of the optimization process. The precision of this model may be decisive for the effectiveness of the GAs using it. This work considers the recently identified multi-structured problems that arise when two or more problems with a different structure (i.e., different variable dependencies) are combined in a non-linear manner. Such problems are hard to solve because, usually, it is not enough to know all the dependencies to solve them effectively. To do so, one must know which dependencies are a part of which substructure, i.e., the dependencies between dependencies. Finally, an optimizer must detect which substructure is valid for the solution at hand. Statistical Linkage Learning (SLL) was proposed to decompose multi-structure problems. However, SLL may report false dependencies, which can deteriorate the search. Therefore, we propose the Conditional Direct Empirical Linkage Discovery (cDLED) technique to decompose multi-structured problems. cDLED guarantees to report only true dependencies. Using cDLED, we propose detecting which problem substructure refers to the given solution. Using these two mechanisms, we propose an optimizer that is highly competitive with other state-of-the-art GAs. We consider single-objective optimization, but our propositions can also be useful in multi- and many-objective optimization. Additionally, we propose a more general formal representation of multi-structured problems.
In complex problems, variable subsets may have a joint non-monotonical influence on function value. Therefore, variation operators in many single-objective (SO) state-of-the-art optimizers leverage such dependent variable sets to improve effectiveness and efficiency. In multi-objective optimization (MO), we optimize multiple objective functions and each may have different dependencies. Thus, choosing relevant dependencies to improve a solution is challenging in MO. To overcome this difficulty, we can transform the MO problem into a set of SO problems (that may be infinite) and discover the dependencies for each SO problem separately. However, dependency discovery is expensive, even for a single problem. Performing it for each SO problem separately seems unacceptable. Moreover, the dependencies of the scalarized problem are neither necessarily a subset nor a superset of the dependencies of all objective functions, making choosing appropriate dependencies impossible. Two variables dependent in all objective functions can be independent in their scalarization, and two variables independent in all objective functions can be dependent in their scalarization. Therefore, we propose the bi-objective non-monotonicity check (BONM). BONM is a linkage learning technique that uses only a single check to discover weight vector ranges for which a given variable pair is dependent concerning the non-monotonicity check. Limiting our proposition only to scalarization using weight vectors may deteriorate its applicability for the MO problems with concave Pareto fronts. Nevertheless, it enables using gray-box-dedicated operators in the black-box setting for MO problems. Finally, the proposed parameter-less optimizer that employs BONM significantly outperforms the competing state-of-the-art MO optimizers.
In many real-world problems, instances arrive in a stream which is likely to experience drift in the instance space over time. If a classical algorithm selector is trained offline, i.e., on an initial part of the instance stream, downstream performance is often negatively impacted due to drift in the instance data. To overcome this limitation of classical algorithm selectors, we propose a novel online automated algorithm selection framework that first uses instance features to detect drift, and then periodically retrains a selector if drift occurs, ensuring continuity of performance in face of data-drift. To further improve both the effectiveness and efficiency of retraining, we also propose a process to continuously gather new training samples on the fly. Empirical comparison using a bin-packing scenario under three different drift scenarios shows that our framework is efficient in terms of the computational effort required to train a selector while maintaining good performance with respect to accuracy compared to several baselines.
This paper examines restart strategies for algorithms whose successful termination depends on a parameter λ. After each restart, λ is increased, until the algorithm terminates successfully. It is assumed that there is an unknown, optimal value for λ. For the algorithm to run successfully, this value must be surpassed. The key question is whether there exists an optimal strategy for selecting λ after each restart taking into account that the computational costs increase with λ. Potential restart strategies are classified into parameter-dependent strategy types. A loss function is introduced to quantify the wasted computational cost relative to the optimal strategy. A crucial requirement for any efficient restart strategy is that its loss, relative to the optimal λ, remains bounded. To this end, upper and lower bounds of the loss are derived. Using these bounds it will be shown that not all strategy types are bounded. However, for a particular strategy type, where λ is increased multiplicatively by a constant factor ρ, the relative loss function is bounded. Furthermore, it will be demonstrated that within this strategy type, there exists an optimal value ρ = 2 that minimizes the maximum relative loss. In the asymptotic limit, this optimal choice does not depend on the unknown optimal A. While the multiplicative strategy with ρ = 2 was already used in implementations of evolutionary algorithms to control the population size showing acceptable performance in applications, a formal proof of its optimality is presented and the underlying conditions are discussed in this paper the first time.
Even with the recent theoretical advancements that dramatically reduced the complexity of Integer Programming (IP), heuristics remain the dominant problem-solvers for this difficult category. This study seeks to establish the foundation for Evolution Strategies (ESs), a class of randomized search heuristics inherently designed for continuous spaces. We particularly focus on ESs for their intrinsic mixed-integer capabilities, well-developed self-adaptation mechanisms, and high efficacy in handling unbounded search spaces. ESs already excel in treating IP in practice, but accomplish it via discretization and by applying sophisticated patches to their continuous operators, while persistently using the ℓ2-norm as their operation pillar. We lay foundations for discrete search by adopting the ℓ1 -norm, accounting for the suitable step-size, and exploring mutation distributions for unbounded integer decision variables. We narrow down to the Truncated Normal (TN) and Double Geometric (DG) distributions, explore their theoretical properties, including entropy functions, and propose a procedure to generate scalable correlated mutations. Our investigations are accompanied by extensive numerical simulations, which consistently support the claim that the DG distribution is better suited for unbounded integer search. We link our theoretical perspective to empirical evidence indicating that an ES with correlated DG mutations outperforms other strategies over non-separable quadratic IP. We conclude that substituting the default TN distribution with the DG is theoretically justified and practically beneficial. We also advocate for the use of the ℓ1-norm over the ℓ2-norm based on the empirical indications presented, although it remains to be formally proven and/or rigorously benchmarked.
One key challenge in optimization is the selection of a suitable set of benchmark problems. A common goal is to find functions which are representative of a class of real-world optimization problems in order to ensure findings on the benchmarks will translate to relevant problem domains. While some problem characteristics are well-covered by popular benchmarking suites, others are often overlooked. One example of such a problem characteristic is permutation invariance, where the search space consists of a set of symmetrical search regions. This type of problem occurs e.g. when a set of solutions has to be found, but the ordering within this set does not matter. The data clustering problem, often seen in machine learning contexts, is a clear example of such an optimization landscape, and has thus been proposed as a base from which optimization benchmarks can be created. In addition to the symmetry aspect, these clustering problems also contain potential regions of neutrality, which can provide an additional challenge to optimization algorithms.
In this paper, we present a standardized benchmark suite for the evaluation of continuous black-box optimization algorithms, based on data clustering problems. To gain insight into the diversity of the benchmark set, both internally and in comparison to existing suites, we perform a benchmarking study of a set of modular CMA-ES configurations, as well as an analysis using exploratory landscape analysis. Our benchmark set is open-source and integrated with the IOHprofiler benchmarking framework to encourage its use in future research.