GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

Full Citation in the ACM Digital Library

Biological evolution as a form of learning

Living organisms function according to protein circuits. Darwin's theory of evolution suggests that these circuits have evolved through variation guided by natural selection. However, it is currently unknown what variation mechanisms can give rise to protein circuits of the complexity found in biology, within realistic population sizes and realistic numbers of generations.

We suggest that computational learning theory offers the framework for investigating this question, of how circuits can come into being via a Darwinian process without a designer. We formulate evolution as a form of learning from examples. The targets of the learning process are the protein expression functions that come closest to best behavior in the specific environment. The learning process is constrained so that the feedback from the experiences is Darwinian. We formulate a notion of evolvability that distinguishes function classes that are evolvable with polynomially bounded resources from those that are not. The dilemma is that if the function class that describes the expression levels of proteins in terms of each other, is too restrictive, then it will not support biology, while if it is too expressive then no evolution algorithm will exist to navigate it. We shall review current work in this area.

Evolutionary computation's niche for solving multi-criterion optimization problems

Evolutionary computation (EC) algorithms involve a careful collaborative and iterative update of a population of solutions to reach near a desired target. In a single-objective search and optimization problem, the respective optimal solution is often the single target. In a multi-criterion optimization problem, the target is a set of Pareto-optimal solutions. Although EC field started with solving single-objective problems, EC researchers soon realized that they were ideal for finding a well-diversed set of multiple Pareto-optimal solutions simultaneously for multi-criterion optimization problems, thereby making a clear niche of EC algorithms compared to their point-based classical counterparts. In this keynote talk, we provide a brief chronology of events on the evolutionary multi-criterion optimization (EMO) field in the past almost three decades, key challenges it faced, and key events and publications which pushed the field forward. We shall also provide a brief account of the current activities and a few pertinent future areas of research and applications.

Removing randomness from evolutionary algorithms

It is natural to think of Evolutionary Algorithms as highly stochastic search methods. This can also make Evolutionary Algorithms, and particularly recombination, quite difficult to analyze. One way to reduce randomness involves the quadratization of functions, which is commonly used by modern optimization methods, and also has applications in quantum computing. After a function is made quadratic, random mutation is obsolete and unnecessary; the location of improving moves can be calculated deterministically, on average in O(1) time. Seemingly impossible problems, such as the Needle-in-a-Haystack, becomes trivial to solve in quadratic form. One can also provably tunnel, or jump, between local optima and quasilocal optima in O(n) time using deterministic genetic recombination.

The talk also explores how removing randomness from Evolutionary Algorithms might provide new insights into natural evolution. Finally, a form of evolutionary algorithm is proposed where premature convergence is impossible and the evolutionary potential of the population remains open-ended.

SESSION: Ant colony optimization and swarm intelligence

A weighted population update rule for PACO applied to the single machine total weighted tardiness problem

In this paper a new population update rule for population based ant colony optimization (PACO) is proposed. PACO is a well known alternative to the standard ant colony optimization algorithm. The new update rule allows to weight different parts of the solutions. PACO with the new update rule is evaluated for the example of the single machine total weighted tardiness problem (SMTWTP). This is an `NP-hard optimization problem where the aim is to schedule jobs on a single machine such that their total weighted tardiness is minimized. PACO with the new population update rule is evaluated with several benchmark instances from the OR-Library. Moreover, the impact of the weights of the jobs on the solutions in the population and on the convergence of the algorithm are analyzed experimentally. The results show that PACO with the new update rule has on average better solution quality than PACO with the standard update rule.

ACO with automatic parameter selection for a scheduling problem with a group cumulative constraint

We consider a RCPSP (resource constrained project scheduling problem), the goal of which is to schedule jobs on machines in order to minimise job tardiness. This problem comes from a real industrial application, and it requires an additional constraint which is a generalisation of the classical cumulative constraint: jobs are partitioned into groups, and the number of active groups must never exceeds a given capacity (where a group is active when some of its jobs have started while some others are not yet completed). We first study the complexity of this new constraint. Then, we describe an Ant Colony Optimisation algorithm to solve our problem, and we compare three different pheromone structures for it. We study the influence of parameters on the solving process, and show that it varies from an instance to another. Hence, we identify a subset of parameter settings with complementary strengths and weaknesses, and we use a per-instance algorithm selector in order to select the best setting for each new instance to solve. We experimentally compare our approach with a tabu search approach and an exact approach on a data set coming from our industrial application.

A multiobjective optimization approach for market timing

The introduction of electronic exchanges was a crucial point in history as it heralded the arrival of algorithmic trading. Designers of such systems face a number of issues, one of which is deciding when to buy or sell a given security on a financial market. Although Genetic Algorithms (GA) have been the most widely used to tackle this issue, Particle Swarm Optimization (PSO) has seen much lower adoption within the domain. In two previous works, the authors adapted PSO algorithms to tackle market timing and address the shortcomings of the previous approaches both with GA and PSO. The majority of work done to date on market timing tackled it as a single objective optimization problem, which limits its suitability to live trading as designers of such strategies will realistically pursue multiple objectives such as maximizing profits, minimizing exposure to risk and using the shortest strategies to improve execution speed. In this paper, we adapt both a GA and PSO to tackle market timing as a multiobjective optimization problem and provide an in depth discussion of our results and avenues of future research.

An approach to assess swarm intelligence algorithms based on complex networks

The growing number of novel swarm-based meta-heuristics has been raising debates regarding their novelty. These algorithms often claim to be inspired by different concepts from nature but the proponents of these seldom demonstrate whether the novelty goes beyond the nature inspiration. In this work, we employed the concept of interaction networks to capture the interaction patterns that take place in algorithms during the optimisation process. The analyses of these networks reveal aspects of the algorithm such as the tendency to achieve premature convergence, population diversity, and stability. Furthermore, we make use of portrait divergence, a newly-proposed state-of-the-art metric, to assess structural similarities between our interaction networks. Using this approach to analyse the cat swarm optimization (CSO) algorithm, we were able to identify some of the algorithm's characteristics, assess the impact of one of the CSO's parameters, and compare this algorithm to two other well-known methods (particle swarm optimization and artificial bee colony). Lastly, we discuss the relationship between the interaction network and the performance of the algorithms assessed.

Exploratory path planning for mobile robots in dynamic environments with ant colony optimization

In the path planning task for autonomous mobile robots, robots should be able to plan their trajectory to leave the start position and reach the goal, safely. There are several path planning approaches for mobile robots in the literature. Ant Colony Optimization algorithms have been investigated for this problem, giving promising results. In this paper, we propose the Max-Min Ant System for Dynamic Path Planning algorithm for the exploratory path planning task for autonomous mobile robots based on topological maps. A topological map is an environment representation whose focus is the main reference points of the environment and their connections. Based on this representation, the path can be composed by a sequence of state/actions pairs, which facilitates the navigability of the path, with no need to have the information of the complete map. The proposed algorithm was evaluated in static and dynamic environments, showing promising results in both of them. Experiments in dynamic environments show the adaptability of our proposal.

SESSION: Complex systems (artificial life/artificial immune systems/generative and developmental systems/evolutionary robotics/evolvable hardware)

Learning behaviour-performance maps with meta-evolution

The MAP-Elites quality-diversity algorithm has been successful in robotics because it can create a behaviorally diverse set of solutions that later can be used for adaptation, for instance to unanticipated damages. In MAP-Elites, the choice of the behaviour space is essential for adaptation, the recovery of performance in unseen environments, since it defines the diversity of the solutions. Current practice is to hand-code a set of behavioural features, however, given the large space of possible behaviour-performance maps, the designer does not know a priori which behavioural features maximise a map's adaptation potential. We introduce a new meta-evolution algorithm that discovers those behavioural features that maximise future adaptations. The proposed method applies Covariance Matrix Adaptation Evolution Strategy to evolve a population of behaviour-performance maps to maximise a meta-fitness function that rewards adaptation. The method stores solutions found by MAP-Elites in a database which allows to rapidly construct new behaviour-performance maps on-the-fly. To evaluate this system, we study the gait of the RHex robot as it adapts to a range of damages sustained on its legs. When compared to MAP-Elites with user-defined behaviour spaces, we demonstrate that the meta-evolution system learns high-performing gaits with or without damages injected to the robot.

Diversity preservation in minimal criterion coevolution through resource limitation

Minimal Criterion Coevolution (MCC) is a recently-introduced algorithm that demonstrates how interactions between two populations, each subject to a simple reproductive constraint, can produce an open-ended search process. Unlike conventional quality diversity (QD) algorithms, which also promote divergence, MCC does not require an explicit characterization of behavior or a comparison of performance, thereby addressing bottlenecks introduced by an intrinsically-finite behavior descriptor and by an assessment of comparative quality. Genetic speciation, a common method of diversity preservation, maintains population diversity in MCC; however, it requires an unnatural explicit comparison of genetic similarity. In nature, organisms are implicitly segregated into niches that each have a carrying capacity dictated by the amount of available resources. To show that MCC can be simpler and more natural while still working effectively, this paper introduces a method of diversity preservation through resource limitation, thereby alleviating the need to formalize and compare genetic distance. Experimental results in a maze navigation domain demonstrate that resource limitation not only maintains higher population diversity in both the maze and agent populations, but also accelerates evolution by forcing individuals to explore new niches, thereby suggesting that resource limitation is an effective, simpler, and more natural alternative for diversity preservation in MCC.

Scaling MAP-Elites to deep neuroevolution

Quality-Diversity (QD) algorithms, and MAP-Elites (ME) in particular, have proven very useful for a broad range of applications including enabling real robots to recover quickly from joint damage, solving strongly deceptive maze tasks or evolving robot morphologies to discover new gaits. However, present implementations of ME and other QD algorithms seem to be limited to low-dimensional controllers with far fewer parameters than modern deep neural network models. In this paper, we propose to leverage the efficiency of Evolution Strategies (ES) to scale MAP-Elites to high-dimensional controllers parameterized by large neural networks. We design and evaluate a new hybrid algorithm called MAP-Elites with Evolution Strategies (ME-ES) for post-damage recovery in a difficult high-dimensional control task where traditional ME fails. Additionally, we show that ME-ES performs efficient exploration, on par with state-of-the-art exploration algorithms in high-dimensional control tasks with strongly deceptive rewards.

Evolving ab initio trading strategies in heterogeneous environments

Securities markets are quintessential complex adaptive systems in which heterogeneous agents compete in an attempt to maximize returns. Species of trading agents are also subject to evolutionary pressure as entire classes of strategies become obsolete and new classes emerge. Using an agent-based model of interacting heterogeneous agents as a flexible environment that can endogenously model many diverse market conditions, we subject deep neural networks to evolutionary pressure to create dominant trading agents. After analyzing the performance of these agents and noting the emergence of anomalous superdiffusion through the evolutionary process, we construct a method to turn high-fitness agents into trading algorithms. We backtest these trading algorithms on real high-frequency foreign exchange data, demonstrating that elite trading algorithms are consistently profitable in a variety of market conditions---even though these algorithms had never before been exposed to real financial data. These results provide evidence to suggest that developing ab initio trading strategies by repeated simulation and evolution in a mechanistic market model may be a practical alternative to explicitly training models with past observed market data.

Novelty search makes evolvability inevitable

Evolvability is an important feature that impacts the ability of evolutionary processes to find interesting novel solutions and to deal with changing conditions of the problem to solve. The estimation of evolvability is not straight-forward and is generally too expensive to be directly used as selective pressure in the evolutionary process. Indirectly promoting evolvability as a side effect of other easier and faster to compute selection pressures would thus be advantageous. In an unbounded behavior space, it has already been shown that evolvable individuals naturally appear and tend to be selected as they are more likely to invade empty behavior niches. Evolvability is thus a natural byproduct of the search in this context. However, practical agents and environments often impose limits on the reachable behavior space. How do these boundaries impact evolvability? In this context, can evolvability still be promoted without explicitly rewarding it? We show that Novelty Search implicitly creates a pressure for high evolvability even in bounded behavior spaces, and explore the reasons for such a behavior. More precisely we show that, throughout the search, the dynamic evaluation of novelty rewards individuals which are very mobile in the behavior space, which in turn promotes evolvability.

Covariance matrix adaptation for the rapid illumination of behavior space

We focus on the challenge of finding a diverse collection of quality solutions on complex continuous domains. While quality diversity (QD) algorithms like Novelty Search with Local Competition (NSLC) and MAP-Elites are designed to generate a diverse range of solutions, these algorithms require a large number of evaluations for exploration of continuous spaces. Meanwhile, variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are among the best-performing derivative-free optimizers in single-objective continuous domains. This paper proposes a new QD algorithm called Covariance Matrix Adaptation MAP-Elites (CMA-ME). Our new algorithm combines the self-adaptation techniques of CMA-ES with archiving and mapping techniques for maintaining diversity in QD. Results from experiments based on standard continuous optimization benchmarks show that CMA-ME finds better-quality solutions than MAP-Elites; similarly, results on the strategic game Hearthstone show that CMA-ME finds both a higher overall quality and broader diversity of strategies than both CMA-ES and MAP-Elites. Overall, CMA-ME more than doubles the performance of MAP-Elites using standard QD performance metrics. These results suggest that QD algorithms augmented by operators from state-of-the-art optimization algorithms can yield high-performing methods for simultaneously exploring and optimizing continuous search spaces, with significant applications to design, testing, and reinforcement learning among other domains.

Discovering representations for black-box optimization

The encoding of solutions in black-box optimization is a delicate, handcrafted balance between expressiveness and domain knowledge --- between exploring a wide variety of solutions, and ensuring that those solutions are useful. Our main insight is that this process can be automated by generating a dataset of high-performing solutions with a quality diversity algorithm (here, MAP-Elites), then learning a representation with a generative model (here, a Variational Autoencoder) from that dataset. Our second insight is that this representation can be used to scale quality diversity optimization to higher dimensions --- but only if we carefully mix solutions generated with the learned representation and those generated with traditional variation operators. We demonstrate these capabilities by learning an low-dimensional encoding for the inverse kinematics of a thousand joint planar arm. The results show that learned representations make it possible to solve high-dimensional problems with orders of magnitude fewer evaluations than the standard MAP-Elites, and that, once solved, the produced encoding can be used for rapid optimization of novel, but similar, tasks. The presented techniques not only scale up quality diversity algorithms to high dimensions, but show that black-box optimization encodings can be automatically learned, rather than hand designed.

Evolution of distributed neural controllers for voxel-based soft robots

Voxel-based soft robots (VSRs) are aggregations of elastic, cubic blocks that have sparkled the interest of Robotics and Artificial Life researchers. VSRs can move by varying the volume of individual blocks, according to control signals dictated by a controller, possibly based on inputs coming from sensors embedded in the blocks. Neural networks (NNs) have been used as centralized processing units for those sensing controllers, with weights optimized using evolutionary computation. This structuring breaks the intrinsic modularity of VSRs: decomposing a VSR into modules to be assembled in a different way is very hard.

In this work we propose an alternative approach that enables full modularity and is based on a distributed neural controller. Each block contains a small NN that outputs signals to adjacent blocks and controls the local volume, based on signals from adjacent blocks and on local sensor readings. We show experimentally for the locomotion task that our controller is as effective as the centralized one. Our experiments also suggest that the proposed framework indeed allows exploiting modularity: VSRs composed of pre-trained parts (body and controller) can be evolved more efficiently than starting from scratch.

Quality diversity for multi-task optimization

Quality Diversity (QD) algorithms are a recent family of optimization algorithms that search for a large set of diverse but high-performing solutions. In some specific situations, they can solve multiple tasks at once. For instance, they can find the joint positions required for a robotic arm to reach a set of points, which can also be solved by running a classic optimizer for each target point. However, they cannot solve multiple tasks when the fitness needs to be evaluated independently for each task (e.g., optimizing policies to grasp many different objects). In this paper, we propose an extension of the MAP-Elites algorithm, called Multi-task MAP-Elites, that solves multiple tasks when the fitness function depends on the task. We evaluate it on a simulated parametrized planar arm (10-dimensional search space; 5000 tasks) and on a simulated 6-legged robot with legs of different lengths (36-dimensional search space; 2000 tasks). The results show that in both cases our algorithm outperforms the optimization of each task separately with the CMA-ES algorithm.

Towards crossing the reality gap with evolved plastic neurocontrollers

A critical issue in evolutionary robotics is the transfer of controllers learned in simulation to reality. This is especially the case for small Unmanned Aerial Vehicles (UAVs), as the platforms are highly dynamic and susceptible to breakage. Previous approaches often require simulation models with a high level of accuracy, otherwise significant errors may arise when the well-designed controller is being deployed onto the targeted platform. Here we try to overcome the transfer problem from a different perspective, by designing a spiking neurocontroller which uses synaptic plasticity to cross the reality gap via online adaptation. Through a set of experiments we show that the evolved plastic spiking controller can maintain its functionality by self-adapting to model changes that take place after evolutionary training, and consequently exhibit better performance than its non-plastic counterpart.

SESSION: Digital entertainment technologies and arts

CPPN2GAN: combining compositional pattern producing networks and GANs for large-scale pattern generation

Generative Adversarial Networks (GANs) are proving to be a powerful indirect genotype-to-phenotype mapping for evolutionary search, but they have limitations. In particular, GAN output does not scale to arbitrary dimensions, and there is no obvious way of combining multiple GAN outputs into a cohesive whole, which would be useful in many areas, such as the generation of video game levels. Game levels often consist of several segments, sometimes repeated directly or with variation, organized into an engaging pattern. Such patterns can be produced with Compositional Pattern Producing Networks (CPPNs). Specifically, a CPPN can define latent vector GAN inputs as a function of geometry, which provides a way to organize level segments output by a GAN into a complete level. This new CPPN2GAN approach is validated in both Super Mario Bros. and The Legend of Zelda. Specifically, divergent search via MAP-Elites demonstrates that CPPN2GAN can better cover the space of possible levels. The layouts of the resulting levels are also more cohesive and aesthetically consistent.

Interactive evolution and exploration within latent level-design space of generative adversarial networks

Generative Adversarial Networks (GANs) are an emerging form of indirect encoding. The GAN is trained to induce a latent space on training data, and a real-valued evolutionary algorithm can search that latent space. Such Latent Variable Evolution (LVE) has recently been applied to game levels. However, it is hard for objective scores to capture level features that are appealing to players. Therefore, this paper introduces a tool for interactive LVE of tile-based levels for games. The tool also allows for direct exploration of the latent dimensions, and allows users to play discovered levels. The tool works for a variety of GAN models trained for both Super Mario Bros. and The Legend of Zelda, and is easily generalizable to other games. A user study shows that both the evolution and latent space exploration features are appreciated, with a slight preference for direct exploration, but combining these features allows users to discover even better levels. User feedback also indicates how this system could eventually grow into a commercial design tool, with the addition of a few enhancements.

SESSION: Evolutionary combinatorial optimization and metaheuristics

A deep learning approach to predicting solutions in streaming optimisation domains

In the field of combinatorial optimisation, per-instance algorithm selection still remains a challenging problem, particularly with respect to streaming problems such as packing or scheduling. Typical approaches involve training a model to predict the best algorithm based on features extracted from the data, which is well known to be a difficult task and even more challenging with streaming data. We propose a radical approach that bypasses algorithm-selection altogether by training a Deep-Learning model using solutions obtained from a set of heuristic algorithms to directly predict a solution from the instance-data. To validate the concept, we conduct experiments using a packing problem in which items arrive in batches. Experiments conducted on six large datasets using batches of varying size show the model is able to accurately predict solutions, particularly with small batch sizes, and surprisingly in a small number of cases produces better solutions than any of the algorithms used to train the model.

Dynamic bi-objective routing of multiple vehicles

In practice, e.g. in delivery and service scenarios, Vehicle-Routing-Problems (VRPs) often imply repeated decision making on dynamic customer requests. As in classical VRPs, tours have to be planned short while the number of serviced customers has to be maximized at the same time resulting in a multi-objective problem. Beyond that, however, dynamic requests lead to the need for re-planning of not yet realized tour parts, while already realized tour parts are irreversible. In this paper we study this type of bi-objective dynamic VRP including sequential decision making and concurrent realization of decisions. We adopt a recently proposed Dynamic Evolutionary Multi-Objective Algorithm (DEMOA) for a related VRP problem and extend it to the more realistic (here considered) scenario of multiple vehicles. We empirically show that our DEMOA is competitive with a multi-vehicle offline and clairvoyant variant of the proposed DEMOA as well as with the dynamic single-vehicle approach proposed earlier.

A robust experimental evaluation of automated multi-label classification methods

Automated Machine Learning (AutoML) has emerged to deal with the selection and configuration of algorithms for a given learning task. With the progression of AutoML, several effective methods were introduced, especially for traditional classification and regression problems. Apart from the AutoML success, several issues remain open. One issue, in particular, is the lack of ability of AutoML methods to deal with different types of data. Based on this scenario, this paper approaches AutoML for multi-label classification (MLC) problems. In MLC, each example can be simultaneously associated to several class labels, unlike the standard classification task, where an example is associated to just one class label. In this work, we provide a general comparison of five automated multi-label classification methods - two evolutionary methods, one Bayesian optimization method, one random search and one greedy search - on 14 datasets and three designed search spaces. Overall, we observe that the most prominent method is the one based on a canonical grammar-based genetic programming (GGP) search method, namely Auto-MEKAGGP. Auto-MEKAGGP presented the best average results in our comparison and was statistically better than all the other methods in different search spaces and evaluated measures, except when compared to the greedy search method.

Do sophisticated evolutionary algorithms perform better than simple ones?

Evolutionary algorithms (EAs) come in all shapes and sizes. Theoretical investigations focus on simple, bare-bones EAs while applications often use more sophisticated EAs that perform well on the problem at hand. What is often unclear is whether a large degree of algorithm sophistication is necessary, and if so, how much performance is gained by adding complexity to an EA. We address this question by comparing the performance of a wide range of theory-driven EAs, from bare-bones algorithms like the (1+1) EA, a (2+1) GA and simple population-based algorithms to more sophisticated ones like the (1+(λ,λ)) GA and algorithms using fast (heavy-tailed) mutation operators, against sophisticated and highly effective EAs from specific applications. This includes a famous and highly cited Genetic Algorithm for the Multidimensional Knapsack Problem and the Parameterless Population Pyramid for Ising Spin Glasses and MaxSat. While for the Multidimensional Knapsack Problem the sophisticated algorithm performs best, surprisingly, for large Ising and MaxSat instances the simplest algorithm performs best. We also derive conclusions about the usefulness of populations, crossover and fast mutation operators. Empirical results are supported by statistical tests and contrasted against theoretical work in an attempt to link theoretical and empirical results on EAs.

Solving constrained combinatorial reverse auctions using MOEAs: a comparative study

Traditional Combinatorial Reverse Auctions (CRAs) (multiple items and single or multiple attributes) have been effectively adopted in several real-world applications. However, looking for the best solution (a set of winning sellers) is difficult to solve due to CRAs complexity. The use of exact algorithms is quite unsuitable in some real-life auctions due to the exponential time cost of these algorithms. Hence, we have opted for multi-objective evolutionary optimization methods (MOEAs) to find the best compromising solutions. This paper makes a comparative study between different MOEAs to solve the Problem of Determination of Winners (WDP) in a Multi Attribute Combinatorial Reverse Auctions (MACRA) of several items with several attributes, establishing a model inspired by a WDP found in the literature. Six real problems of purchasing electronic products of different complexities are considered. The algorithms were assessed according to MOEA evaluation metrics and according to the best auction solution.

Journey to the center of the linear ordering problem

A number of local search based algorithms have been designed to escape from the local optima, such as, iterated local search or variable neighborhood search. The neighborhood chosen for the local search as well as the escape technique play a key role in the performance of these algorithms. Of course, a specific strategy has a different effect on distinct problems or instances. In this paper, we focus on a permutation-based combinatorial optimization problem: the linear ordering problem. We provide a theoretical landscape analysis for the adjacent swap, the swap and the insert neighborhoods. By making connections to other different problems found in the Combinatorics field, we prove that there are some moves in the local optima that will necessarily return a worse or equal solution. The number of these non-better solutions that could be avoided by the escape techniques is considerably large with respect to the number of neighbors. This is a valuable information that can be included in any of those algorithms designed to escape from the local optima, increasing their efficiency.

Solving the single row facility layout problem by differential evolution

Differential evolution is an efficient evolutionary optimization paradigm that has shown a good ability to solve a variety of practical problems, including combinatorial optimization ones. Single row facility layout problem is an NP-hard permutation problem often found in facility design, factory construction, production optimization, and other areas. Real-world problems can be cast as large single row facility location problem instances with different high-level properties and efficient algorithms that can solve them efficiently are needed. In this work, the differential evolution is used to solve the single row facility location problem and the ability of three different variants of the algorithm to evolve solutions to various problem instances is studied.

Multi-layer local optima networks for the analysis of advanced local search-based algorithms

A Local Optima Network (LON) is a graph model that compresses the fitness landscape of a particular combinatorial optimization problem based on a specific neighborhood operator and a local search algorithm. Determining which and how landscape features affect the effectiveness of search algorithms is relevant for both predicting their performance and improving the design process. This paper proposes the concept of multi-layer LONs as well as a methodology to explore these models aiming at extracting metrics for fitness landscape analysis. Constructing such models, extracting and analyzing their metrics are the preliminary steps into the direction of extending the study on single neighborhood operator heuristics to more sophisticated ones that use multiple operators. Therefore, in the present paper we investigate a two-layer LON obtained from instances of a combinatorial problem using bit-flip and swap operators. First, we enumerate instances of NK-landscape model and use the hill climbing heuristic to build the corresponding LONs. Then, using LON metrics, we analyze how efficiently the search might be when combining both strategies. The experiments show promising results and demonstrate the ability of multi-layer LONs to provide useful information that could be used for in metaheuristics based on multiple operators such as Variable Neighborhood Search.

Just-in-time batch scheduling subject to batch size

This paper considers single-machine just-in-time scheduling of jobs that may be grouped into batches subject to a constraint on batches' weights. A job has a weight, due date, and earliness and tardiness penalties per unit time. A batch's processing time is determined by its jobs. Each job inherits its batch's completion time. The objective is to minimize the weighted sum of earliness and tardiness penalties of all jobs. This problem is challenging: Jobs-to-batch assignment changes the batch's processing time; thus, affects the structure of the entire solution and most importantly of its cost components. This problem is an excellent benchmark for testing linkage learning techniques, which exploit a problem's structure. We propose a matheuristic LTGA, which integrates evolutionary algorithms with mixed-integer programming (MIP). It builds a linkage tree that extracts the dependency among decision variables of MIP solutions. We compare its performance to a state of the art matheuristic CMSA, which combines the learning component of an ant colony system (ACS) with MIP within a construct, solve, merge, and adapt framework. It uses knowledge extracted from ACS' ants to construct a restricted MIP, solves it efficiently, and feeds it back to ACS. Computational tests indicate that LTGA outperforms MIP solvers and CMSA.

Advanced statistical analysis of empirical performance scaling

Theoretical running time complexity analysis is a widely adopted method for studying the scaling behaviour of algorithms. However, theoretical analysis remains intractable for many high-performance, heuristic algorithms. Recent advances in statistical methods for empirical running time scaling analysis have shown that many state-of-the-art algorithms can achieve significantly better scaling in practice than expected. However, current techniques have only been successfully applied to study algorithms on randomly generated instance sets, since they require instances that can be grouped into "bins", where each instance in a bin has the same size. In practice, real-world instance sets with this property are rarely available. We introduce a novel method that overcomes this limitation. We apply our method to a broad range of scenarios and demonstrate its effectiveness by revealing new insights into the scaling of several prominent algorithms; e.g., the SAT solver lingeling often appears to achieve sub-polynomial scaling on prominent bounded model checking instances, and the training times of scikit-learn's implementation of SVMs scale as a lower-degree polynomial than expected (≈ 1.51 instead of 2).

Golden parameter search: exploiting structure to quickly configure parameters in parallel

Automated algorithm configuration procedures such as SMAC, GGA++ and irace can often find parameter configurations that substantially improve the performance of state-of-the-art algorithms for difficult problems - e.g., a three-fold speedup in the running time required by EAX, a genetic algorithm, to find optimal solutions to a set of widely studied TSP instances. However, it is usually recommended to provide these methods with running time budgets of one or two days of wall-clock time as well as dozens of CPU cores. Most general-purpose algorithm configuration methods are based on powerful meta-heuristics that are designed for challenging and complex search landscapes; however, recent work has shown that many algorithms appear to have parameter configuration landscapes with a relatively simple structure. We introduce the golden parameter search (GPS) algorithm, an automatic configuration procedure designed to exploit this structure while optimizing each parameter semi-independently in parallel. We compare GPS to several state-of-the-art algorithm configurators and show that it often finds similar or better parameter configurations using a fraction of the computing time budget across a broad range of scenarios spanning TSP, SAT and MIP.

Why many travelling salesman problem instances are easier than you think

While there are many inexact heuristics for generating high quality solutions to the Travelling Salesman Problem, our understanding of why these methods are effective and efficient is still limited. This paper looks at two population based heuristics: the EAX algorithm and the Mixing GA using partition crossover. We show that the local optima used to construct the initial population are also sampling edges found in the global optimum at an extremely high rate: in the majority of TSP instances, the number of global edges in the initial population is more than 73%. Next, we look at how recombination operators increase the representation of edges from the global optimum in the population, or increase the number of global edges in the best solutions in the population. We also look at TSP instances that are more difficult to solve, and again we find that edge frequency information can help to explain algorithm performance. Finally we use these result to suggest new strategies for generating high quality solutions for Travelling Salesman Problems.

Automatic decomposition of mixed integer programs for lagrangian relaxation using a multiobjective approach

This paper presents a new method to automatically decompose general Mixed Integer Programs (MIPs). To do so, we represent the constraint matrix for a general MIP problem as a hypergraph and relax constraints by removing hyperedges from the hypergraph. A Breadth First Search algorithm is used to identify the individual partitions that now exist and the resulting decomposed problem. We propose that good decompositions have both a small number of constraints relaxed and small subproblems in terms of the number of variables they contain. We use the multiobjective Nondominated Sorting Genetic Algorithm II (NSGA-II) to create decompositions which minimize both the size of the subproblems and the number of constraints relaxed. We show through our experiments the types of decompositions our approach generates and test empirically the effectiveness of these decompositions in producing bounds when used in a Lagrangian Relaxation framework. The results demonstrate that the bounds generated by our decompositions are significantly better than those attained by solving the Linear Programming relaxation, as well as the bounds found via random and greedy constraint relaxation and decomposition generation.

Specific single- and multi-objective evolutionary algorithms for the chance-constrained knapsack problem

The chance-constrained knapsack problem is a variant of the classical knapsack problem where each item has a weight distribution instead of a deterministic weight. The objective is to maximize the total profit of the selected items under the condition that the weight of the selected items only exceeds the given weight bound with a small probability of α. In this paper, we consider problem-specific single-objective and multi-objective approaches for the problem. We examine the use of heavy-tail mutations and introduce a problem-specific crossover operator to deal with the chance-constrained knapsack problem. Empirical results for single-objective evolutionary algorithms show the effectiveness of our operators compared to the use of classical operators. Moreover, we introduce a new effective multi-objective model for the chance-constrained knapsack problem. We use this model in combination with the problem-specific crossover operator in multi-objective evolutionary algorithms to solve the problem. Our experimental results show that this leads to significant performance improvements when using the approach in evolutionary multi-objective algorithms such as GSEMO and NSGA-II.

SESSION: Evolutionary machine learning

Evolving inborn knowledge for fast adaptation in dynamic POMDP problems

Rapid online adaptation to changing tasks is an important problem in machine learning and, recently, a focus of meta-reinforcement learning. However, reinforcement learning (RL) algorithms struggle in POMDP environments because the state of the system, essential in a RL framework, is not always visible. Additionally, hand-designed meta-RL architectures may not include suitable computational structures for specific learning problems. The evolution of online learning mechanisms, on the contrary, has the ability to incorporate learning strategies into an agent that can (i) evolve memory when required and (ii) optimize adaptation speed to specific online learning problems. In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP. The analysis of the evolved networks reveals the ability of the proposed algorithm to acquire inborn knowledge in a variety of aspects such as the detection of cues that reveal implicit rewards, and the ability to evolve location neurons that help with navigation. The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms. The algorithm proved also to succeed in the 3D gaming environment Malmo Minecraft.

Evolutionary optimization of deep learning activation functions

The choice of activation function can have a large effect on the performance of a neural network. While there have been some attempts to hand-engineer novel activation functions, the Rectified Linear Unit (ReLU) remains the most commonly-used in practice. This paper shows that evolutionary algorithms can discover novel activation functions that outperform ReLU. A tree-based search space of candidate activation functions is defined and explored with mutation, crossover, and exhaustive search. Experiments on training wide residual networks on the CIFAR-10 and CIFAR-100 image datasets show that this approach is effective. Replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy. Optimal performance is achieved when evolution is allowed to customize activation functions to a particular task; however, these novel activation functions are shown to generalize, achieving high performance across tasks. Evolutionary optimization of activation functions is therefore a promising new dimension of metalearning in neural networks.

Exploring the evolution of GANs through quality diversity

Generative adversarial networks (GANs) achieved relevant advances in the field of generative algorithms, presenting high-quality results mainly in the context of images. However, GANs are hard to train, and several aspects of the model should be previously designed by hand to ensure training success. In this context, evolutionary algorithms such as COEGAN were proposed to solve the challenges in GAN training. Nevertheless, the lack of diversity and premature optimization can be found in some of these solutions. We propose in this paper the application of a quality-diversity algorithm in the evolution of GANs. The solution is based on the Novelty Search with Local Competition (NSLC) algorithm, adapting the concepts used in COEGAN to this new proposal. We compare our proposal with the original COEGAN model and with an alternative version using a global competition approach. The experimental results evidenced that our proposal increases the diversity of the discovered solutions and leverage the performance of the models found by the algorithm. Furthermore, the global competition approach was able to consistently find better models for GANs.

Feature standardisation and coefficient optimisation for effective symbolic regression

Symbolic regression is a common application of genetic programming where model structure and corresponding parameters are evolved in unison. In the majority of work exploring symbolic regression, features are used directly without acknowledgement of their relative scale or unit. This paper extends recent work on the importance of standardisation of features when conducting symbolic regression. Specifically, z-score standardisation of input features is applied to both inputs and response to ensure that evolution explores a model space with zero mean and unit variance. This paper demonstrates that standardisation allows a simpler function set to be used without increasing bias. Additionally, it is demonstrated that standardisation can significantly improve the performance of coefficient optimisation through gradient descent to produce accurate models. Through analysis of several benchmark data sets, we demonstrate that feature standardisation enables simple but effective approaches that are comparable in performance to the state-of-the-art in symbolic regression.

Improving neuroevolutionary transfer learning of deep recurrent neural networks through network-aware adaptation

Transfer learning entails taking an artificial neural network (ANN) that is trained on a source dataset and adapting it to a new target dataset. While this has been shown to be quite powerful, its use has generally been restricted by architectural constraints. Previously, in order to reuse and adapt an ANN's internal weights and structure, the underlying topology of the ANN being transferred across tasks must remain mostly the same while a new output layer is attached, discarding the old output layer's weights. This work introduces network-aware adaptive structure transfer learning (N-ASTL), an advancement over prior efforts to remove this restriction. N-ASTL utilizes statistical information related to the source network's topology and weight distribution in order to inform how new input and output neurons are to be integrated into the existing structure. Results show improvements over prior state-of-the-art, including the ability to transfer in challenging real-world datasets not previously possible and improved generalization over RNNs without transfer.

Improving generalisation of AutoML systems with dynamic fitness evaluations

A common problem machine learning developers are faced with is overfitting, that is, fitting a pipeline too closely to the training data that the performance degrades for unseen data. Automated machine learning aims to free (or at least ease) the developer from the burden of pipeline creation, but this overfitting problem can persist. In fact, this can become more of a problem as we look to iteratively optimise the performance of an internal cross-validation (most often k-fold). While this internal cross-validation hopes to reduce this overfitting, we show we can still risk overfitting to the particular folds used. In this work, we aim to remedy this problem by introducing dynamic fitness evaluations which approximate repeated k-fold cross-validation, at little extra cost over single k-fold, and far lower cost than typical repeated k-fold. The results show that when time equated, the proposed fitness function results in significant improvement over the current state-of-the-art baseline method which uses an internal single k-fold. Furthermore, the proposed extension is very simple to implement on top of existing evolutionary computation methods, and can provide essentially a free boost in generalisation/testing performance.

Improving constrained clustering via decomposition-based multiobjective optimization with memetic elitism

Clustering has always been a topic of interest in knowledge discovery, it is able to provide us with valuable information within the unsupervised machine learning framework. It received renewed attention when it was shown to produce better results in environments where partial information about how to solve the problem is available, thus leading to a new machine learning paradigm: semi-supervised machine learning. This new type of information can be given in the form of constraints, which guide the clustering process towards quality solutions. In particular, this study considers the pairwise instance-level must-link and cannot-link constraints. Given the ill-posed nature of the constrained clustering problem, we approach it from the multiobjective optimization point of view. Our proposal consists in a memetic elitist evolutionary strategy that favors exploitation by applying a local search procedure to the elite of the population and transferring its results only to the external population, which will also be used to generate new individuals. We show the capability of this method to produce quality results for the constrained clustering problem when considering incremental levels of constraint-based information. For the comparison with state-of-the-art methods, we include previous multiobjective approaches, single-objective genetic algorithms and classic constrained clustering methods.

Self-adaptation of XCS learning parameters based on learning theory

This paper proposes a self-adaptation technique of parameter settings used in the XCS learning scheme. Since we adaptively set those settings to their optimum values derived by the recent XCS learning theory, our proposal does not require any trial and error process to find their proper values. Thus, our proposal can always satisfy the optimality of XCS learning scheme, i.e. to distinguish accurate rules from inaccurate rules with the minimum update number of rules. Experimental results on artificial classification problems including overlapping problems show that XCS with our self-adaptation technique significantly outperforms the standard XCS.

GeneCAI: <u>gene</u>tic evolution for acquiring <u>c</u>ompact <u>AI</u>

In the contemporary big data realm, Deep Neural Networks (DNNs) are evolving towards more complex architectures to achieve higher inference accuracy. Model compression techniques can be leveraged to efficiently deploy these compute-intensive architectures on resource-limited mobile devices. Such methods comprise various hyperparameters that require per-layer customization to ensure high accuracy. Choosing the hyperparameters is cumbersome as the pertinent search space grows exponentially with model layers. This paper introduces GeneCAI, a novel optimization method that automatically learns how to tune per-layer compression hyperparameters. We devise a bijective translation scheme that encodes compressed DNNs to the genotype space. Each genotype's optimality is measured using a multi-objective score based on the accuracy and number of floating-point operations. We develop customized genetic operations to iteratively evolve the non-dominated solutions towards the optimal Pareto front, thus, capturing the optimal trade-off between model accuracy and complexity. GeneCAI optimization method is highly scalable and can achieve a near-linear performance boost on distributed multi-GPU platforms. Our extensive evaluations demonstrate that GeneCAI outperforms existing rule-based and reinforcement learning methods in DNN compression by finding models that lie on a better accuracy/complexity Pareto curve.

Program synthesis as latent continuous optimization: evolutionary search in neural embeddings

In optimization and machine learning, the divide between discrete and continuous problems and methods is deep and persistent. We attempt to remove this distinction by training neural network autoencoders that embed discrete candidate solutions in continuous latent spaces. This allows us to take advantage of state-of-the-art continuous optimization methods for solving discrete optimization problems, and mitigates certain challenges in discrete optimization, such as design of bias-free search operators. In the experimental part, we consider program synthesis as the special case of combinatorial optimization. We train an autoencoder network on a large sample of programs in a problem-agnostic, unsupervised manner, and then use it with an evolutionary continuous optimization algorithm (CMA-ES) to map the points from the latent space to programs. We propose also a variant in which semantically similar programs are more likely to have similar embeddings. Assessment on a range of benchmarks in two domains indicates the viability of this approach and the usefulness of involving program semantics.

Absumption and subsumption based learning classifier systems

Learning Classifier Systems (LCSs) are a group of rule-based evolutionary computation techniques, which have been frequently applied to data-mining tasks. Evidence shows that LCSs can produce models containing human-discernible patterns. But, traditional LCSs cannot efficiently discover consistent, general rules - especially in domains that have unbalanced class distribution. The reason is that traditional search methods, e.g. crossover, mutation, and roulette wheel deletion, rely on stochasticity to find and keep optimum rules. Recently, absumption has been introduced to deterministically remove over-general rules, which is complementary to subsumption that deterministically removes over-specific rules. It is hypothesized that utilizing just assumption & subsumption transforms the search process from stochastic to deterministic, which benefits LCSs in evolving interpretable models and removing the need to tune search parameters to the problem. Interrogatable artificial Boolean domains with varying numbers of attributes are considered as benchmarks. The new LCS, termed Absumption Subsumption Classifier System (ASCS), successfully produces interpretable models for all the complex domains tested, whereas the non-optimal rules in existing techniques obscure the patterns. ACSC's ability to handle complex search spaces is observed, e.g. for the 14-bits Majority-On problem the required 6435 different cooperating rules were discovered enabling correct pattern visualization.

Relatedness measures to aid the transfer of building blocks among multiple tasks

Multitask Learning is a learning paradigm that deals with multiple different tasks in parallel and transfers knowledge among them. XOF, a Learning Classifier System using tree-based programs to encode building blocks (meta-features), constructs and collects features with rich discriminative information for classification tasks in an Observed List. This paper seeks to facilitate the automation of feature transferring in between tasks by utilising the Observed List. We hypothesise that the best discriminative features of a classification task carry its characteristics. Therefore, the relatedness between any two tasks can be estimated by comparing their most appropriate patterns. We propose a multiple-XOF system, called mXOF, that can dynamically adapt feature transfer among XOFs. This system utilises the Observed List to estimate the task relatedness. This method enables the automation of transferring features. In terms of knowledge discovery, the resemblance estimation provides insightful relations among multiple data. We experimented mXOF on various scenarios, e.g. representative Hierarchical Boolean problems, classification of distinct classes in the UCI Zoo dataset, and unrelated tasks, to validate its abilities of automatic knowledgetransfer and estimating task relatedness. Results show that mXOF can estimate the relatedness reasonably between multiple tasks to aid the learning performance with the dynamic feature transferring.

Neural architecture search for sparse DenseNets with dynamic compression

Neural Architecture Search (NAS) algorithms have discovered highly novel state-of-the-art Convolutional Neural Networks (CNNs) for image classification, and are beginning to improve our understanding of CNN architectures. However, within NAS research, there are limited studies focussing on the role of skip-connections, and how the configurations of connections between layers can be optimised to improve CNN performance. Our work focusses on developing a new evolutionary NAS algorithm based on adjacency matrices to optimise skip-connection structures, creating more specialised and powerful skip-connection structures within a DenseNet-BC network than previously seen in the literature. Our work further demonstrates how simple adjacency matrices can be interpreted in a way which allows for a more dynamic variant of DenseNet-BC. The final algorithm, using this novel interpretation of adjacency matrices for architecture design and evolved on the CIFAR100 dataset, finds networks with improved performance relative to a baseline DenseNet-BC network on both the CIFAR10 and CIFAR100 datasets, being the first, to our knowledge, NAS algorithm for skip-connection optimisation to do so. Finally, skip-connection structures discovered by our algorithm are analysed, and some important skip-connection patterns are highlighted.

Lateralized learning for robustness against adversarial attacks in a visual classification system

Deep learning is an important field of machine learning. It is playing a critical role in a variety of applications ranging from self-driving cars to security and surveillance. However, deep networks have deep flaws. For example, they are highly vulnerable to adversarial attacks. One reason may be the homogeneous nature of their knowledge representation, which allows a single disruptive pattern to cause miss-classification. Biological intelligence has lateral asymmetry, which allows heterogeneous, modular learning at different levels of abstraction, enabling different representations of the same object.

This work aims to incorporate lateralization and modular learning at different levels of abstraction in an evolutionary machine learning system. The results of image classification tasks show that the lateralized system efficiently learns hierarchical distributions of knowledge, demonstrating performance that is similar to (or better than) other state-of-the-art deep systems as it reasons using multiple representations. Crucially, the novel system outperformed all the state-of-the-art deep models for the classification of normal and adversarial images by 0.43% -- 2.56% and 2.15% -- 25.84%, respectively. Lateralisation enabled the system to exhibit robustness beyond previous work, which advocates for the creation of data sets that enable components of objects and the objects themselves to be learned specifically or in an end-to-end manner.

XCS classifier system with experience replay

XCS constitutes the most deeply investigated classifier system today. It offers strong potentials and comes with inherent capabilities for mastering a variety of different learning tasks. Besides outstanding successes in various classification and regression tasks, XCS also proved very effective in certain multi-step environments from the domain of reinforcement learning. Especially in the latter domain, recent advances have been mainly driven by algorithms which model their policies based on deep neural networks, among which the Deep-Q-Network (DQN) being a prominent representative. Experience Replay (ER) constitutes one of the crucial factors for the DQN's successes, since it facilitates stabilized training of the neural network-based Q-function approximators. Surprisingly, XCS barely takes advantage of similar mechanisms that leverage remembered raw experiences. To bridge this gap, this paper investigates the benefits of extending XCS with ER. We demonstrate that for single-step tasks ER yields strong improvements in terms of sample efficiency. On the downside, however, we reveal that ER might further aggravate well-studied issues not yet solved for XCS when applied to sequential decision problems demanding for long-action-chains.

Neuroevolution of self-interpretable agents

Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning (RL) tasks, allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent. We argue that self-attention has similar properties as indirect encoding, in the sense that large implicit weight matrices are generated from a small number of key-query parameters, thus enabling our agent to solve challenging vision based tasks with at least 1000x fewer parameters than existing methods. Since our agent attends to only task critical visual hints, they are able to generalize to environments where task irrelevant elements are modified while conventional methods fail.1

Re-purposing heterogeneous generative ensembles with evolutionary computation

Generative Adversarial Networks (GANs) are popular tools for generative modeling. The dynamics of their adversarial learning give rise to convergence pathologies during training such as mode and discriminator collapse. In machine learning, ensembles of predictors demonstrate better results than a single predictor for many tasks. In this study, we apply two evolutionary algorithms (EAs) to create ensembles to re-purpose generative models, i.e., given a set of heterogeneous generators that were optimized for one objective (e.g., minimize Fréchet Inception Distance), create ensembles of them for optimizing a different objective (e.g., maximize the diversity of the generated samples). The first method is restricted by the exact size of the ensemble and the second method only restricts the upper bound of the ensemble size. Experimental analysis on the MNIST image benchmark demonstrates that both EA ensembles creation methods can re-purpose the models, without reducing their original functionality. The EA-based demonstrate significantly better performance compared to other heuristic-based methods. When comparing both evolutionary, the one with only an upper size bound on the ensemble size is the best.

Safe crossover of neural networks through neuron alignment

One of the main and largely unexplored challenges in evolving the weights of neural networks using genetic algorithms is to find a sensible crossover operation between parent networks. Indeed, naive crossover leads to functionally damaged offspring that do not retain information from the parents. This is because neural networks are invariant to permutations of neurons, giving rise to multiple ways of representing the same solution. This is often referred to as the competing conventions problem. In this paper, we propose a two-step safe crossover (SC) operator. First, the neurons of the parents are functionally aligned by computing how well they correlate, and only then are the parents recombined. We compare two ways of measuring relationships between neurons: Pairwise Correlation (PwC) and Canonical Correlation Analysis (CCA). We test our safe crossover operators (SC-PwC and SC-CCA) on MNIST and CIFAR-10 by performing arithmetic crossover on the weights of feed-forward neural network pairs. We show that it effectively transmits information from parents to offspring and significantly improves upon naive crossover. Our method is computationally fast, can serve as a way to explore the fitness landscape more efficiently and makes safe crossover a potentially promising operator in future neuroevolution research and applications.

Segmented initialization and offspring modification in evolutionary algorithms for bi-objective feature selection

In classification, feature selection mainly aims at reducing the dataset dimensionality and increasing the classification accuracy, which also results in higher computational efficiency than using the original full set of features. Population-based meta-heuristic, evolutionary algorithms have been widely used to solve the bi-objective feature selection problem, which minimizes the number of selected features and the error of classification model. However, most of them are not specifically designed for feature selection, and disregard many of its complex characteristics. In this paper, we propose a generic approach that focuses on improving the initialization effectiveness and offspring quality, in order to boost the performance of existing evolutionary algorithms for bi-objective feature selection. To be more specific, a segmented initialization mechanism is used to enhance the exploration width, while an offspring modification mechanism is proposed to ensure the exploitation depth. Combining them together will make a good trade-off between the diversity and convergence. In the experiments, we plug the proposed approach into three different types of multi-objective evolutionary algorithms, and test them on 18 classification datasets with two widely-used performance metrics. The empirical results prove the significant contribution of the proposed approach on the optimization and classification performance.

Multi-fitness learning for behavior-driven cooperation

Evolutionary learning algorithms have been successfully applied to multiagent problems where the desired system behavior can be captured by a single fitness signal. However, the complexity of many real world applications cannot be reduced to a single number, particularly when the fitness (i) arrives after a lengthy sequence of actions, and (ii) depends on the joint-action of multiple teammates. In this paper, we introduce the multi-fitness learning paradigm to enable multiagent teams to identify which fitness matters when in domains that require long-term, complex coordination. We demonstrate that multi-fitness learning efficiently solves a cooperative exploration task where teams of rovers must coordinate to observe various points of interest in a specific but unknown order.

SESSION: Evolutionary multiobjective optimization

Designing parallelism in surrogate-assisted multiobjective optimization based on decomposition

On the one hand, surrogate-assisted evolutionary algorithms are established as a method of choice for expensive black-box optimization problems. On the other hand, the growth in computing facilities has seen a massive increase in potential computational power, granted the users accommodate their approaches with the offered parallelism. While a number of studies acknowledge the impact of parallelism for single-objective expensive optimization assisted by surrogates, extending such techniques to the multi-objective setting has not yet been properly investigated, especially within the state-of-the-art decomposition framework. We first highlight the different degrees of parallelism in existing surrogate-assisted multi-objective evolutionary algorithms based on decomposition (S-MOEA/D). We then provide a comprehensive analysis of the key steps towards a successful parallel S-MOEA/D approach. Through an extensive benchmarking effort relying on the well-established bbob-biobj test functions, we analyze the performance of the different algorithm designs with respect to the problem dimensionality and difficulty, the amount of parallel cores available, and the supervised learning models considered. In particular, we show the difference in algorithm scalability based on the selected surrogate-assisted approaches, the performance impact of distributing the model training task and the efficacy of the designed parallel-surrogate methods.

Multi-objective hyperparameter tuning and feature selection using filter ensembles

Both feature selection and hyperparameter tuning are key tasks in machine learning. Hyperparameter tuning is often useful to increase model performance, while feature selection is undertaken to attain sparse models. Sparsity may yield better model interpretability and lower cost of data acquisition, data handling and model inference. While sparsity may have a beneficial or detrimental effect on predictive performance, a small drop in performance may be acceptable in return for a substantial gain in sparseness. We therefore treat feature selection as a multi-objective optimization task. We perform hyperparameter tuning and feature selection simultaneously because the choice of features of a model may influence what hyperparameters perform well.

We present, benchmark, and compare two different approaches for multi-objective joint hyperparameter optimization and feature selection: The first uses multi-objective model-based optimization. The second is an evolutionary NSGA-II-based wrapper approach to feature selection which incorporates specialized sampling, mutation and recombination operators. Both methods make use of parameterized filter ensembles.

While model-based optimization needs fewer objective evaluations to achieve good performance, it incurs computational overhead compared to the NSGA-II, so the preferred choice depends on the cost of evaluating a model on given data.

Computation of the improvement directions of the Pareto front and its application to MOEAs

This paper introduces the mathematical development and algorithm of the Improvement-Directions Mapping (IDM) method, which computes improvement directions to "push" the current solutions toward the true Pareto front. The main idea is to compute normal vectors to the front, as improvement directions in the objective space, to be then transformed into search directions in the variable space through a transformation tensor. The main contributions of the IDM as a local search operator versus previous approaches are the following: 1) It does not require of a priori information about improvement directions or location of the true Pareto front, 2) It uses a local quadratic approximation of the Pareto front to compute the transformation tensor, thus, reducing numerical problems and avoiding abrupt changes in the search direction which could lead to erratic searches. These features allow the IDM to be implemented as a local search operator within any Multi-objective Evolutionary Algorithm (MOEA). The potential of the IDM is shown by hybridizing two well-known multi-objective algorithms: a) MOEA/D + IDM; b) NSGA-II + IDM. In the first approach, IDM "pushes" the offspring population in each iteration. A similar experiment is performed with the second approach. Furthermore, one more experiment evaluates the IDM as a refinement step that is applied to the last Pareto front delivered by NSGA-II.

Data structures for non-dominated sets: implementations and empirical assessment of two decades of advances

Many data structures have been developed over the last two decades for the storage and efficient update of unconstrained sets of mutually non-dominating solutions. Typically, analysis has been provided in the original works for these data structures in terms of worst/average case complexity performance. Often, however, other aspects such as rebalancing costs of underlying data structures, cache sizes, etc., can also significantly affect behaviour. Empirical performance comparison has often (but not always) been limited to run-time comparison with a basic linear list. No comprehensive comparison between the different specialised data structures proposed in the last two decades has thus far been undertaken. We take significant strides in addressing this here. Eight data structures from the literature are implemented within the same overarching open source Java framework. We additionally highlight and rectify some errors in published work --- and offer additional efficiency gains. Run-time performances are compared and contrasted, using data sequences embodying a number of different characteristics. We show that in different scenarios different data structures are preferable, and that those with the lowest big O complexity are not always the best performing. We also find that performance profiles can vary drastically with computational architecture, in a non-linear fashion.

Another difficulty of inverted triangular pareto fronts for decomposition-based multi-objective algorithms

A set of uniformly sampled weight vectors from a unit simplex has been frequently used in decomposition-based multi-objective algorithms. The number of the generated weight vectors is controlled by a user-defined parameter H. In the literature, good results are often reported on test problems with triangular Pareto fronts since the shape of the Pareto fronts is consistent with the distribution of the weight vectors. However, when a problem has an inverted triangular Pareto front, well-distributed solutions over the entire Pareto front are not obtained due to the inconsistency between the Pareto front shape and the weight vector distribution. In this paper, we demonstrate that the specification of H has an unexpected large effect on the performance of decomposition-based multi-objective algorithms when the test problems have inverted triangular Pareto fronts. We clearly explain why their performance is sensitive to the specification of H in an unexpected manner (e.g., H = 3 is bad but H = 4 is good for three-objective problems whereas H = 3 is good but H = 4 is bad for four-objective problems). After these discussions, we suggest a simple weight vector specification method for inverted triangular Pareto fronts.

Effects of dominance resistant solutions on the performance of evolutionary multi-objective and many-objective algorithms

Dominance resistant solutions (DRSs) in multi-objective problems have very good values for some objectives and very bad values for other objectives. Whereas DRSs are far away from the Pareto front, they are hardly dominated by other solutions due to some very good objective values. It is well known that the existence of DRSs severely degrades the search ability of Pareto dominance-based algorithms such as NSGA-II and SPEA2. In this paper, we examine the effect of DRSs on the search ability of NSGA-II on the DTLZ test problems with many objectives. We slightly change their problem formulation to increase the size of the DRS region. Through computational experiments, we show that DRSs have a strong negative effect on the search ability of NSGA-II whereas they have almost no effect on MOEA/D with the PBI function. We also show that a slightly modified NSGA-II for decreasing the negative effect of DRSs works well on many-objective DTLZ test problems (its performance is similar to NSGA-III and MOEA/D). These results suggest that DTLZ is not an appropriate test suite for evaluating many-objective evolutionary algorithms. This issue is further addressed through computational experiments on newly formulated test problems with no distance function.

If unsure, shuffle: deductive sort is Θ(MN3), but O(MN2) in expectation over input permutations

Despite significant advantages in theory of evolutionary computation, many papers related to evolutionary algorithms still lack proper analysis and limit themselves by rather vague reflections on why making a certain design choice improves the performance. While this seems to be unavoidable when talking about the behavior of an evolutionary algorithm on a practical optimization problem, doing the same for computational complexities of parts of evolutionary algorithms is harmful and should be avoided.

Non-dominated sorting is one of such parts, commonly used in various evolutionary multiobjective algorithms. The complexity of the problem as such is not well-understood, and many algorithms were proposed for solving it in recent years. Unfortunately, the analysis of some of them is imperfect. In this paper, we prove that, contrary to what is claimed by its authors, the algorithm known as Deductive Sort has the worst-case time complexity of Θ(MN3), where M is the number of objectives and N is the population size. However, if one shuffles the input, the worst-case expected running time drops down to O(MN2).

What is a good direction vector set for the R2-based hypervolume contribution approximation

The hypervolume contribution is an important concept in hypervolume-based evolutionary multi-objective optimization algorithms. It describes the loss of the hypervolume when a solution is removed from the current population. Since its calculation is #P-hard in the number of objectives, its approximation is necessary for many-objective optimization problems. Recently, an R2-based hypervolume contribution approximation method was proposed. This method relies on a set of direction vectors for the approximation. However, the influence of different direction vector generation methods on the approximation quality has not been studied yet. This paper aims to investigate this issue. Five direction vector generation methods are investigated, including Das and Dennis's method (DAS), unit normal vector method (UNV), JAS method, maximally sparse selection method with DAS (MSS-D), and maximally sparse selection method with UNV (MSS-U). Experimental results suggest that the approximation quality strongly depends on the direction vector generation method. The JAS and UNV methods show the best performance whereas the DAS method shows the worst performance. The reasons behind the results are also analyzed.

Multiobjective tree-structured parzen estimator for computationally expensive optimization problems

Practitioners often encounter computationally expensive multiobjective optimization problems to be solved in a variety of real-world applications. On the purpose of challenging these problems, we propose a new surrogate-based multiobjective optimization algorithm that does not require a large evaluation budget. It is called Multiobjective Tree-structured Parzen Estimator (MOTPE) and is an extension of the tree-structured Parzen estimator widely used to solve expensive single-objective optimization problems. Our empirical evidences reveal that MOTPE can approximate Pareto fronts of many benchmark problems better than existing methods with a limited budget. In this paper, we discuss furthermore the influence of MOTPE configurations to understand its behavior.

Surrogate-assisted multi-objective combinatorial optimization based on decomposition and walsh basis

We consider the design and analysis of surrogate-assisted algorithms for expensive multi-objective combinatorial optimization. Focusing on pseudo-boolean functions, we leverage existing techniques based on Walsh basis to operate under the decomposition framework of MOEA/D. We investigate two design components for the cheap generation of a promising pool of offspring and the actual selection of one solution for expensive evaluation. We propose different variants, ranging from a filtering approach that selects the most promising solution at each iteration by using the constructed Walsh surrogates to discriminate between a pool of offspring generated by variation, to a substitution approach that selects a solution to evaluate by optimizing the Walsh surrogates in a multi-objective manner. Considering bi-objective NK landscapes as benchmark problems offering different degree of non-linearity, we conduct a comprehensive empirical analysis including the properties of the achievable approximation sets, the anytime performance, and the impact of the order used to train the Walsh surrogates. Our empirical findings show that, although our surrogate-assisted design is effective, the optimal integration of Walsh models within a multi-objective evolutionary search process gives rise to particular questions for which different trade-off answers can be obtained.

Runtime analysis of evolutionary algorithms with biased mutation for the multi-objective minimum spanning tree problem

Evolutionary algorithms (EAs) are general-purpose problem solvers that usually perform an unbiased search. This is reasonable and desirable in a black-box scenario. For combinatorial optimization problems, often more knowledge about the structure of optimal solutions is given, which can be leveraged by means of biased search operators. We consider the Minimum Spanning Tree (MST) problem in a single- and multi-objective version, and introduce a biased mutation, which puts more emphasis on the selection of edges of low rank in terms of low domination number. We present example graphs where the biased mutation can significantly speed up the expected runtime until (Pareto-)optimal solutions are found. On the other hand, we demonstrate that bias can lead to exponential runtime if "heavy" edges are necessarily part of an optimal solution. However, on general graphs in the single-objective setting, we show that a combined mutation operator which decides for unbiased or biased edge selection in each step with equal probability exhibits a polynomial upper bound - as unbiased mutation - in the worst case and benefits from bias if the circumstances are favorable.

Surrogate assisted evolutionary algorithm for medium scale multi-objective optimisation problems

Building a surrogate model of an objective function has shown to be effective to assist evolutionary algorithms (EAs) to solve real-world complex optimisation problems which involve either computationally expensive numerical simulations or costly physical experiments. However, their effectiveness mostly focuses on small-scale problems with less than 10 decision variables. The scalability of surrogate assisted EAs (SAEAs) have not been well studied yet. In this paper, we propose a Gaussian process surrogate model assisted EA for medium-scale expensive multi-objective optimisation problems with up to 50 decision variables. There are three distinctive features of our proposed SAEA. First, instead of using all decision variables in surrogate model building, we only use those correlated ones to build the surrogate model for each objective function. Second, rather than directly optimising the surrogate objective functions, the original multi-objective optimisation problem is transformed to a new one based on the surrogate models. Last but not the least, a subset selection method is developed to choose a couple of promising candidate solutions for actual objective function evaluations thus to update the training dataset. The effectiveness of our proposed algorithm is validated on benchmark problems with 10, 20, 50 variables, comparing with three state-of-the-art SAEAs.

On the elicitation of indirect preferences in interactive evolutionary multiple objective optimization

We consider essential challenges related to the elicitation of indirect preference information in interactive evolutionary algorithms for multiple objective optimization. The methods in this stream use holistic judgments provided by the Decision Maker (DM) to progressively bias the evolutionary search toward his/her most preferred region in the Pareto front. We enhance such an interactive process using three targeted developments and illustrate their efficiency in the context of a decomposition-based evolutionary framework. Firstly, we present some active learning strategies for selecting solutions from the current population that should be critically compared by the DM. These strategies implement the paradigm of maximizing the potential information gain derived from the DM's answer. Secondly, we discuss the procedures for deciding when the DM should be questioned for preference information. In this way, we refer to a more general problem of distributing the DM's interactions with the method in a way that ensures sufficient evolutionary pressure. Thirdly, we couple the evolutionary schemes with different types of indirect preferences, including pairwise comparisons, preference intensities, best-of-k judgments, and complete orders of a small subset of solutions. A thorough experimental analysis indicates that the three introduced advancements have a positive impact on the DM-perceived quality of constructed solutions.

Gap finding and validation in evolutionary multi- and many-objective optimization

Over 30 years, evolutionary multi- and many-objective optimization (EMO/EMaO) algorithms have been extensively applied to find well-distributed Pareto-optimal (PO) solutions in a single run. However, in real-world problems, the PO front may not always be a single continuous hyper-surface, rather several irregularities may exist involving disjointed surfaces, holes within the surface, or patches of mixed-dimensional surfaces. When a set of trade-off solutions are obtained by EMO/EMaO algorithms, there may exist less dense or no solutions (we refer as 'gaps') in certain parts of the front. This can happen for at least two reasons: (i) gaps naturally exist in the PO front, or (ii) no natural gaps exists, but the chosen EMO/EMaO algorithm is not able to find any solution in the apparent gaps. To make a confident judgement, we propose a three-step procedure here. First, we suggest a computational procedure to identify gaps, if any, in the EMO/EMaO-obtained PO front. Second, we propose a computational method to identify well-distributed gap-points in the gap regions. Third, we apply a focused EMO/EMaO algorithm to search for possible representative trade-off points in the gaps. We then propose two metrics to qualitatively establish whether a gap truly exists in the obtained dataset, and if yes, whether the gap naturally exists on the true Pareto-set. Procedures are supported by results on two to five-objective test problems and on a five-objective scheduling problem from a steel-making industry.

Transfer learning for gaussian process assisted evolutionary bi-objective optimization for objectives with different evaluation times

Despite the success of evolutionary algorithms (EAs) for solving multi-objective problems, most of them are based on the assumption that all objectives can be evaluated within the same period of time. However, in many real-world applications, such an assumption is unrealistic since different objectives must be evaluated using different computer simulations or physical experiments with various time complexities. To address this issue, a surrogate assisted evolutionary algorithm along with a parameter-based transfer learning (T-SAEA) is proposed in this work. While the surrogate for the cheap objective can be updated on sufficient training data, the surrogate for the expensive one is updated by either the training data set or a transfer learning approach. To find out the transferable knowledge, a filter-based feature selection algorithm is used to capture the pivotal features of each objective, and then use the common important features as a carrier for knowledge transfer between the cheap and expensive objectives. Then, the corresponding parameters in the surrogate models are adaptively shared to enhance the quality of the surrogate models. The experimental results demonstrate that the proposed algorithm outperforms the compared algorithms on the bi-objective optimization problems whose objectives have a large difference in computational complexities.

Constraint handling within MOEA/D through an additional scalarizing function

The Multi-Objective Evolutionary Algorithm based on Decomposition (MOEA/D) has shown high-performance levels when solving complicated multi-objective optimization problems. However, its adaptation for dealing with constrained multi-objective optimization problems (cMOPs) keeps being under the scope of recent investigations. This paper introduces a novel selection mechanism inspired by the ε-constraint method, which builds a bi-objective problem considering the scalarizing function (used into the decomposition approach of MOEA/D) and the constraint violation degree as an objective function. During the selection step of MOEA/D, the scalarizing function is considered to choose the best solutions to the cMOP. Preliminary results obtained over a set of complicated test problems drawn from the CF test suite indicate that the proposed algorithm is highly competitive regarding state-of-the-art MOEAs adopted in our comparative study.

SESSION: Evolutionary numerical optimization

Leveraging conditional linkage models in gray-box optimization with the real-valued gene-pool optimal mixing evolutionary algorithm

Often, real-world problems are of the gray-box type. It has been shown that the Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (RV-GOMEA) is in principle capable of exploiting such a Gray-Box Optimization (GBO) setting using linkage models that capture dependencies between problem variables, resulting in excellent performance and scalability on both benchmark and real-world problems that allow for partial evaluations. However, linkage models proposed for RV-GOMEA so far cannot explicitly capture overlapping dependencies. Consequently, performance degrades if such dependencies exist. In this paper, we therefore introduce various ways of using conditional linkage models in RV-GOMEA. Their use is compared to that of non-conditional models, and to VkD-CMA, whose performance is among the state of the art, on various benchmark problems with overlapping dependencies. We find that RV-GOMEA with conditional linkage models achieves the best scalability on most problems, with conditional models leading to similar or better performance than non-conditional models. We conclude that the introduction of conditional linkage models to RV-GOMEA is an important contribution, as it expands the set of problems for which optimization in a GBO setting results in substantially improved performance and scalability. In future work, conditional linkage models may prove to benefit the optimization of real-world problems.

Distributed random walks for fitness landscape analysis

Fitness landscape analysis is used to mathematically characterize optimization problems. In order to perform fitness landscape analysis on continuous-valued optimization problems, a sample of the fitness landscape needs to be taken. A common way to perform this sampling is to use random walk algorithms. This paper proposes a new random walk algorithm for continuous-valued optimization problems, called the distributed random walk algorithm. The algorithm is based on the premise that multiple short random walks of the same type will provide better coverage of the decision space and more robust fitness landscape measures than a single long random walk. The distributed random walk algorithm is simple to implement, and the computational overhead is insignificant compared to random walk algorithms in the literature. The results of the study indicate that the distributed random walk algorithm achieves both of these objectives. Furthermore, the benefits of the distributed random walk algorithm are shown to be much more significant when small step sizes are used in the random walks.

Versatile black-box optimization

Choosing automatically the right algorithm using problem descriptors is a classical component of combinatorial optimization. It is also a good tool for making evolutionary algorithms fast, robust and versatile. We present Shiwa, an algorithm good at both discrete and continuous, noisy and noise-free, sequential and parallel, black-box optimization. Our algorithm is experimentally compared to competitors on YABBOB, a BBOB comparable testbed, and on some variants of it, and then validated on several real world testbeds.

A surrogate-assisted metaheuristic for bilevel optimization

A Bilevel Optimization Problem (BOP) is related to two optimization problems in a hierarchical structure. A BOP is solved when an optimum of the upper level problem is found, subject to the optimal response of the respective lower level problem. This paper presents a metaheuristic method assisted by a kernel interpolation numerical technique to approximate optimal solutions of a BOP. Two surrogate methods approximate upper and lower level objective functions on solutions obtained by a population-based algorithm adapted to save upper level objective function evaluations. Some theoretical properties about kernel interpolation are used to study global convergence in some particular BOPs. The empirical results of this approach are analyzed when representative test functions for bilevel optimization are solved. The overall performance provided by the proposal is competitive.

Deep generative model for non-convex constraint handling

In this study, we consider black-box minimization problems with non-convex constraints, where the constraints are significantly cheaper to evaluate than the objective. Non-convex constraints generally make it difficult to solve problems using evolutionary approaches. In this paper, we revisit a conventional technique called decoder constraint handling, which transforms a feasible non-convex domain into an easy-to-control convex set. This approach is promising because it transforms a constrained problem into an almost unconstrained one. However, its application has been considerably limited, because designing or training such a nonlinear decoder requires domain knowledge or manually prepared training data. To fully automate the decoder design, we use deep generative models. We propose a novel scheme to train a deep generative model without using manually prepared training data. For this purpose, we first train feasible solution samplers, which are deep neural networks, using the constraint functions. Subsequently, we train another deep generative model using the data generated from the trained samplers as the training data. The proposed framework is applied to tasks inspired by topology optimization problems. The empirical study demonstrates that the proposed approach can locate better solutions with fewer objective function evaluations than the existing approach.

Analyzing adaptive parameter landscapes in parameter adaptation methods for differential evolution

Since the scale factor and the crossover rate significantly influence the performance of differential evolution (DE), parameter adaptation methods (PAMs) for the two parameters have been well studied in the DE community. Although PAMs can sufficiently improve the effectiveness of DE, PAMs are poorly understood (e.g., the working principle of PAMs). One of the difficulties in understanding PAMs comes from the unclarity of the parameter space that consists of the scale factor and the crossover rate. This paper addresses this issue by analyzing adaptive parameter landscapes in PAMs for DE. First, we propose a concept of an adaptive parameter landscape, which captures a moment in a parameter adaptation process. For each iteration, each individual in the population has its adaptive parameter landscape. Second, we propose a method of analyzing adaptive parameter landscapes using a 1-step-lookahead greedy improvement metric. Third, we examine adaptive parameter landscapes in three PAMs by using the proposed method. Results provide insightful information about PAMs in DE.

Towards dynamic algorithm selection for numerical black-box optimization: investigating BBOB as a use case

One of the most challenging problems in evolutionary computation is to select from its family of diverse solvers one that performs well on a given problem. This algorithm selection problem is complicated by the fact that different phases of the optimization process require different search behavior. While this can partly be controlled by the algorithm itself, there exist large differences between algorithm performance. It can therefore be beneficial to swap the configuration or even the entire algorithm during the run. Long deemed impractical, recent advances in Machine Learning and in exploratory landscape analysis give hope that this dynamic algorithm configuration (dynAC) can eventually be solved by automatically trained configuration schedules. With this work we aim at promoting research on dynAC, by introducing a simpler variant that focuses only on switching between different algorithms, not configurations. Using the rich data from the Black Box Optimization Benchmark (BBOB) platform, we show that even single-switch dynamic Algorithm selection (dynAS) can potentially result in significant performance gains. We also discuss key challenges in dynAS, and argue that the BBOB-framework can become a useful tool in overcoming these.

SESSION: Genetic algorithms

An improved GPU-accelerated heuristic technique applied to the capacitated vehicle routing problem

The capacitated vehicle routing problem (CVRP) is a well-known NP-hard combinatorial problem. Genetic algorithms (GAs) are often used for solving CVRPs. However, the computational effort required to find a feasible solution becomes problematic for very large instances. Parallel-computation technology can significantly improve the performance of CVRP solution algorithms to deal with large problem sets, especially when using GAs. In this paper, an improved genetic algorithm is designed to be entirely executed on an NVIDIA GPU, taking advantage of the special CUDA GPU architecture to solving the CVRP. By distributing array elements over the GPU grid and using GPU kernel functions, the proposed algorithm successfully provides high-quality solutions within reasonable computational times, and near-optimal solutions for smaller benchmark problems. Within this framework, we address the execution speed problem in CVRPs by developing an algorithm that is entirely running on an NVIDIA GPU, investigate how to incorporate local search algorithms with the GA, and develop comparisons between the algorithm performance on both the CPU and the GPU. The utility of improved local search algorithms in the overall performance of the GA is also reported.

AutoLR: an evolutionary approach to learning rate policies

The choice of a proper learning rate is paramount for good Artificial Neural Network training and performance. In the past, one had to rely on experience and trial-and-error to find an adequate learning rate. Presently, a plethora of state of the art automatic methods exist that make the search for a good learning rate easier. While these techniques are effective and have yielded good results over the years, they are general solutions. This means the optimization of learning rate for specific network topologies remains largely unexplored. This work presents AutoLR, a framework that evolves Learning Rate Schedulers for a specific Neural Network Architecture using Structured Grammatical Evolution. The system was used to evolve learning rate policies that were compared with a commonly used baseline value for learning rate. Results show that training performed using certain evolved policies is more efficient than the established baseline and suggest that this approach is a viable means of improving a neural network's performance.

Evolving diverse sets of tours for the travelling salesperson problem

Evolving diverse sets of high quality solutions has gained increasing interest in the evolutionary computation literature in recent years. With this paper, we contribute to this area of research by examining evolutionary diversity optimisation approaches for the classical Traveling Salesperson Problem (TSP). We study the impact of using different diversity measures for a given set of tours and the ability of evolutionary algorithms to obtain a diverse set of high quality solutions when adopting these measures. Our studies show that a large variety of diverse high quality tours can be achieved by using our approaches. Furthermore, we compare our approaches in terms of theoretical properties and the final set of tours obtained by the evolutionary diversity optimisation algorithm.

Adaptively preserving solutions in both feasible and infeasible regions on generalized multiple constraint ranking

In the present work, we propose some new modifications of an existing constraint handling technique (CHT) for single-objective optimization problems. The base CHT is generalized multiple constraint ranking (G-MCR), which is already a modified version of the original CHT, MCR. Despite that G-MCR significantly outperformed the original MCR in the previous study, it is found that G-MCR tends to generate very few feasible individuals on a certain real-world like engineering design problem. In the present work, G-MCR is further modified to strike a better balance between feasible and infeasible individuals on each generation in an adaptive way so that the interaction between feasible and infeasible regions can be maintained, thus providing more efficient search towards constrained global optimum. Based on the investigation on 78 benchmark problems, we obtain that some of the proposed modifications produce more robust convergence performance by obtaining significant superiority on many types of problems. On real-world like engineering design problems, we also observe that the feasibility ratio generated on each generation might have an important role in improving the convergence performance.

Genetic algorithm for the weight maximization problem on weighted automata

The weight maximization problem (WMP) is the problem of finding the word of highest weight on a weighted finite state automaton (WFA). It is an essential question that emerges in many optimization problems in automata theory. Unfortunately, the general problem can be shown to be undecidable, whereas its bounded decisional version is NP-complete. Designing efficient algorithms that produce approximate solutions to the WMP in reasonable time is an appealing research direction that can lead to several new applications including formal verification of systems abstracted as WFAs. In particular, in combination with a recent procedure that translates a recurrent neural network into a weighted automaton, an algorithm for the WMP can be used to analyze and verify the network by exploiting the simpler and more compact automata model.

In this work, we propose, implement and evaluate a metaheuristic based on genetic algorithms to approximate solutions to the WMP. We experimentally evaluate its performance on examples from the literature and show its potential on different applications.

Comparative mixing for DSMGA-II

Dependency Structure Matrix Genetic Algorithm-II (DSMGA-II) is a recently proposed optimization method that builds the linkage model on the base of the Dependency Structure Matrix (DSM). This model is used during the Optimal Mixing (OM) operators, such as the Restricted Mixing (RM) and the Back Mixing (BM). DSMGA-II was shown to solve theoretical and real-world optimization problems effectively. In this paper, we show that the effectiveness of DSMGA-II and its improved version, namely Two-edge Dependency Structure Matrix Genetic Algorithm-II (DSMGA-IIe), is relatively low for NK-landscape problems. Thus, we propose the Comparative Mixing (CM) operator that extends the RM operator. The CM operator modifies the linkage information obtained from the DSM-based linkage model by comparing the receiver individual with a randomly selected member of the population. Such modification enables DSMGA-II to solve NK-landscape problems effectively and does not limit DSMGA-II performance on most problems for which it was already shown effective.

A biased random key genetic algorithm applied to the VRPTW with skill requirements and synchronization constraints

We applied a Biased Random Key Genetic Algorithm (BRKGA) to solve the Vehicle Routing Problem with Time Windows and Synchronization Constraints. Additionally, both vehicles and clients are skilled, and each client can require up to two distinct skills to be serviced. On double-skilled clients, the operations of each skill should be performed by different vehicles, either simultaneously or respecting a precedence order. Those requirements introduce nonlinearities on the problem, in the sense that a small change on a single route potentially impacts all the other routes of the solution, making it hard to define an effective local search procedure. To circumvent this difficulty, we approached the problem using a genetic algorithm that evolves the sequence in which the services are inserted into the routes. We assessed the performance of our solution method using instances from the literature of the home health care problem. The genetic algorithm outperformed the previous best-known solutions found by a fix-and-optimize matheuristic by up to 25%, using less than half of computational times reported previously. The BRKGA demonstrated to be able to perform well both in exploration and exploitation in the solution space of the problem.

Multi-layer heterogeneous ensemble with classifier and feature selection

Deep Neural Networks have achieved many successes when applying to visual, text, and speech information in various domains. The crucial reasons behind these successes are the multi-layer architecture and the in-model feature transformation of deep learning models. These design principles have inspired other sub-fields of machine learning including ensemble learning. In recent years, there are some deep homogenous ensemble models introduced with a large number of classifiers in each layer. These models, thus, require a costly computational classification. Moreover, the existing deep ensemble models use all classifiers including unnecessary ones which can reduce the predictive accuracy of the ensemble. In this study, we propose a multi-layer ensemble learning framework called MUlti-Layer heterogeneous Ensemble System (MULES) to solve the classification problem. The proposed system works with a small number of heterogeneous classifiers to obtain ensemble diversity, therefore being efficiency in resource usage. We also propose an Evolutionary Algorithm-based selection method to select the subset of suitable classifiers and features at each layer to enhance the predictive performance of MULES. The selection method uses NSGA-II algorithm to optimize two objectives concerning classification accuracy and ensemble diversity. Experiments on 33 datasets confirm that MULES is better than a number of well-known benchmark algorithms.

New search operators for node-depth based encoding

Node-Depth Based Encoding is a representation for Evolutionary Algorithms applied to problems modelled by trees, storing nodes and their respective depths in an appropriately ordered list. This representation was highlighted by the results obtained, whose mutation and recombination operators have low time complexity in relation to other representations for the same problems. This work proposes new search operators. A high locality mutation operator, having as its main differential the possibility of including any edge present in a graph and a recombination operator with the ability to generate solutions with as much heritability as possible considering the edges set in the solutions, making it possible to recombine any solutions in the population with the aim of improving search convergence. All proposed operators also allow for dealing with problems modelled by forests with many trees and complete or non-complete graphs, always generating feasible solutions. This study performed bias, locality and heritability investigations for proposed operators showing that they have adequate characteristics for dealing with high dimensionality problems. In order to evaluate the performance of proposed operators, the results of preliminary tests were obtained applying to a benchmark problem in the literature. The use of proposed operators resulted in faster convergence.

On measuring and improving the quality of linkage learning in modern evolutionary algorithms applied to solve partially additively separable problems

Linkage learning is frequently employed in modern evolutionary algorithms. High linkage quality may be the key to an evolutionary method's effectiveness. Similarly, the faulty linkage may be the reason for its poor performance. Many state-of-the-art evolutionary methods use a Dependency Structure Matrix (DSM) to obtain linkage. In this paper, we propose a quality measure for DSM. Based on this measure, we analyze the behavior of modern evolutionary methods. We show the dependency between the linkage and the effectiveness of the considered methods. Finally, we propose a framework that improves the quality of the linkage.

Modelling parameter configuration spaces with local optima networks

Most algorithms proposed for solving complex problems require the definition of some parameter values. The process of finding suitable parameter values is an optimization problem by itself. Understanding the global structure of search spaces of complex optimization problems remains a challenge. Moreover, understanding the relationship between parameter values and the performance of metaheuristics is a key issue on their development. Local optima networks propose a scheme to model search spaces as networks whose nodes represent local optima and edges represent transitions between them. In this work, we adapt the local optima network model to analyze and visualize the global structure of parameter configuration spaces. Our main objectives are to understand the structure of these networks and explore the difficulty of different tuning scenarios using common indicators previously proposed in local optima networks studies (e.g. number of local optima, number of global optima and presence of local and global funnels). For this, we use the well-known tuning method ParamILS to analyze configuration search spaces of a standard genetic algorithm that solves continuous optimization problems.

Understanding transforms of pseudo-boolean functions

There exist general transforms that convert pseudo-Boolean functions into k-bounded pseudo-Boolean functions, for all k ≥ 2. In addition to these general transforms, there can also exist specialized transforms that can be applied in special cases. New results are presented examining what happens to the "bit flip" neighborhood when transforms are applied. Transforms condense variables in a particular order. We show that different variable orderings produce different results in terms of problem difficulty. We also prove new results about the embedding of the original function in the new k-bounded function. Finally, this paper also looks at how parameter optimization problems can be expressed as high precision k-bounded pseudo-Boolean functions. This paper lays a foundation for the wider application of evolutionary algorithms to k-bounded pseudo-Boolean functions.

SESSION: General evolutionary computation and hybrids

Using implicit multi-objectives properties to mitigate against forgetfulness in coevolutionary algorithms

It had been noticed that, while coevolutionary computational systems have only a single objective when evaluating, there is a subtle multi-objective aspect to evaluation since different pairings can be thought of as different objectives (all in support of the single original objective). Previously researchers used this to identify pairings of individuals during evaluation within a single generation. However, because of the problems of forgetfulness and the Red-Queen effect, this does not allow for the proper control that the technique promises. In this research, this implicit multi-objective approach is extended to function between generations as well as within. This makes it possible to implement a more powerful form of elitism as well as mitigate against some of the pathologies of Coevolutionary systems that forgetfulness and the Red-Queen effect engender, thus providing more robust solutions.

Initial design strategies and their effects on sequential model-based optimization: an exploratory case study based on BBOB

Sequential model-based optimization (SMBO) approaches are algorithms for solving problems that require computationally or otherwise expensive function evaluations. The key design principle of SMBO is a substitution of the true objective function by a surrogate, which is used to propose the point(s) to be evaluated next.

SMBO algorithms are intrinsically modular, leaving the user with many important design choices. Significant research efforts go into understanding which settings perform best for which type of problems. Most works, however, focus on the choice of the model, the acquisition function, and the strategy used to optimize the latter. The choice of the initial sampling strategy, however, receives much less attention. Not surprisingly, quite diverging recommendations can be found in the literature.

We analyze in this work how the size and the distribution of the initial sample influences the overall quality of the efficient global optimization (EGO) algorithm, a well-known SMBO approach. While, overall, small initial budgets using Halton sampling seem preferable, we also observe that the performance landscape is rather unstructured. We furthermore identify several situations in which EGO performs unfavorably against random sampling. Both observations indicate that an adaptive SMBO design could be beneficial, making SMBO an interesting test-bed for automated algorithm design.

ϵ-shotgun: ϵ-greedy batch bayesian optimisation

Bayesian optimisation is a popular surrogate model-based approach for optimising expensive black-box functions. Given a surrogate model, the next location to expensively evaluate is chosen via maximisation of a cheap-to-query acquisition function. We present an ϵ-greedy procedure for Bayesian optimisation in batch settings in which the black-box function can be evaluated multiple times in parallel. Our ϵ-shotgun algorithm leverages the model's prediction, uncertainty, and the approximated rate of change of the landscape to determine the spread of batch solutions to be distributed around a putative location. The initial target location is selected either in an exploitative fashion on the mean prediction, or - with probability ϵ - from elsewhere in the design space. This results in locations that are more densely sampled in regions where the function is changing rapidly and in locations predicted to be good (i.e. close to predicted optima), with more scattered samples in regions where the function is flatter and/or of poorer quality. We empirically evaluate the ϵ-shotgun methods on a range of synthetic functions and two real-world problems, finding that they perform at least as well as state-of-the-art batch methods and in many cases exceed their performance.

Bivariate estimation-of-distribution algorithms can find an exponential number of optima

Finding a large set of optima in a multimodal optimization landscape is a challenging task. Classical population-based evolutionary algorithms (EAs) typically converge only to a single solution. While this can be counteracted by applying niching strategies, the number of optima is nonetheless trivially bounded by the population size.

Estimation-of-distribution algorithms (EDAs) are an alternative, maintaining a probabilistic model of the solution space instead of an explicit population. Such a model is able to implicitly represent a solution set that is far larger than any realistic population size.

To support the study of how optimization algorithms handle large sets of optima, we propose the test function EqalBlocksOneMax (EBOM). It has an easy to optimize fitness landscape, however, with an exponential number of optima. We show that the bivariate EDA mutual-information-maximizing input clustering (MIMIC), without any problem-specific modification, quickly generates a model that behaves very similarly to a theoretically ideal model for that function, which samples each of the exponentially many optima with the same maximal probability.

From understanding genetic drift to a smart-restart parameter-less compact genetic algorithm

One of the key difficulties in using estimation-of-distribution algorithms is choosing the population sizes appropriately: Too small values lead to genetic drift, which can cause enormous difficulties. In the regime with no genetic drift, however, often the runtime is roughly proportional to the population size, which renders large population sizes inefficient.

Based on a recent quantitative analysis which population sizes lead to genetic drift, we propose a parameter-less version of the compact genetic algorithm that automatically finds a suitable population size without spending too much time in situations unfavorable due to genetic drift.

We prove an easy mathematical runtime guarantee for this algorithm and conduct an extensive experimental analysis on four classic benchmark problems. The former shows that under a natural assumption, our algorithm has a performance similar to the one obtainable from the best population size. The latter confirms that missing the right population size can be highly detrimental and shows that our algorithm as well as a previously proposed parameter-less one based on parallel runs avoids such pitfalls. Comparing the two approaches, ours profits from its ability to abort runs which are likely to be stuck in a genetic drift situation.

Effective reinforcement learning through evolutionary surrogate-assisted prescription

There is now significant historical data available on decision making in organizations, consisting of the decision problem, what decisions were made, and how desirable the outcomes were. Using this data, it is possible to learn a surrogate model, and with that model, evolve a decision strategy that optimizes the outcomes. This paper introduces a general such approach, called Evolutionary Surrogate-Assisted Prescription, or ESP. The surrogate is, for example, a random forest or a neural network trained with gradient descent, and the strategy is a neural network that is evolved to maximize the predictions of the surrogate model. ESP is further extended in this paper to sequential decision-making tasks, which makes it possible to evaluate the framework in reinforcement learning (RL) benchmarks. Because the majority of evaluations are done on the surrogate, ESP is more sample efficient, has lower variance, and lower regret than standard RL approaches. Surprisingly, its solutions are also better because both the surrogate and the strategy network regularize the decision making behavior. ESP thus forms a promising foundation to decision optimization in real-world problems.

Analysis of the performance of algorithm configurators for search heuristics with global mutation operators

Recently it has been proved that a simple algorithm configurator called ParamRLS can efficiently identify the optimal neighbourhood size to be used by stochastic local search to optimise two standard benchmark problem classes. In this paper we analyse the performance of algorithm configurators for tuning the more sophisticated global mutation operator used in standard evolutionary algorithms, which flips each of the n bits independently with probability χ/n and the best value for χ has to be identified. We compare the performance of configurators when the best-found fitness values within the cutoff time k are used to compare configurations against the actual optimisation time for two standard benchmark problem classes, Ridge and LeadingOnes. We rigorously prove that all algorithm configurators that use optimisation time as performance metric require cutoff times that are at least as large as the expected optimisation time to identify the optimal configuration. Matters are considerably different if the fitness metric is used. To show this we prove that the simple ParamRLS-F configurator can identify the optimal mutation rates even when using cutoff times that are considerably smaller than the expected optimisation time of the best parameter value for both problem classes.

On the choice of the parameter control mechanism in the (1+(λ, λ)) genetic algorithm

The self-adjusting (1 + (λ, λ)) GA is the best known genetic algorithm for problems with a good fitness-distance correlation as in OneMax. It uses a parameter control mechanism for the parameter λ that governs the mutation strength and the number of offspring. However, on multimodal problems, the parameter control mechanism tends to increase λ uncontrollably.

We study this problem and possible solutions to it using rigorous runtime analysis for the standard Jumpk benchmark problem class. The original algorithm behaves like a (1+n) EA whenever the maximum value λ = n is reached. This is ineffective for problems where large jumps are required. Capping λ at smaller values is beneficial for such problems. Finally, resetting λ to 1 allows the parameter to cycle through the parameter space. We show that this strategy is effective for all Jumpk problems: the (1 + (λ, λ)) GA performs as well as the (1 + 1) EA with the optimal mutation rate and fast evolutionary algorithms, apart from a small polynomial overhead.

Along the way, we present new general methods for bounding the runtime of the (1 + (λ, λ)) GA that allows to translate existing runtime bounds from the (1 + 1) EA to the self-adjusting (1 + (λ, λ)) GA. Our methods are easy to use and give upper bounds for novel classes of functions.

Landscape-aware fixed-budget performance regression and algorithm selection for modular CMA-ES variants

Automated algorithm selection promises to support the user in the decisive task of selecting a most suitable algorithm for a given problem. A common component of these machine-trained techniques are regression models which predict the performance of a given algorithm on a previously unseen problem instance. In the context of numerical black-box optimization, such regression models typically build on exploratory landscape analysis (ELA), which quantifies several characteristics of the problem. These measures can be used to train a supervised performance regression model.

First steps towards ELA-based performance regression have been made in the context of a fixed-target setting. In many applications, however, the user needs to select an algorithm that performs best within a given budget of function evaluations. Adopting this fixed-budget setting, we demonstrate that it is possible to achieve high-quality performance predictions with off-the-shelf supervised learning approaches, by suitably combining two differently trained regression models. We test this approach on a very challenging problem: algorithm selection on a portfolio of very similar algorithms, which we choose from the family of modular CMA-ES algorithms.

Algorithm selection of anytime algorithms

Anytime algorithms for optimization problems are of particular interest since they allow to trade off execution time with result quality. However, the selection of the best anytime algorithm for a given problem instance has been focused on a particular budget for execution time or particular target result quality. Moreover, it is often assumed that these anytime preferences are known when developing or training the algorithm selection methodology. In this work, we study the algorithm selection problem in a context where the decision maker's anytime preferences are defined by a general utility function, and only known at the time of selection. To this end, we first examine how to measure the performance of an anytime algorithm with respect to this utility function. Then, we discuss approaches for the development of selection methodologies that receive a utility function as an argument at the time of selection. Then, to illustrate one of the discussed approaches, we present a preliminary study on the selection between an exact and a heuristic algorithm for a bi-objective knapsack problem. The results show that the proposed methodology has an accuracy greater than 96% in the selected scenarios, but we identify room for improvement.

CMA-ES for one-class constraint synthesis

We propose CMA-ES for One-Class Constraint Synthesis (CMAESOCCS), a method that synthesizes Mixed-Integer Linear Programming (MILP) model from exemplary feasible solutions to this model using Covariance Matrix Adaptation - Evolutionary Strategy (CMA-ES). Given a one-class training set, CMAESOCCS adaptively detects partitions in this set, synthesizes independent Linear Programming models for all partitions and merges these models into a single MILP model. CMAESOCCS is evaluated experimentally using synthetic problems. A practical use case of CMAESOCCS is demonstrated based on a problem of synthesis of a model for a rice farm. The obtained results are competitive when compared to a state-of-the-art method.

Expected improvement versus predicted value in surrogate-based optimization

Surrogate-based optimization relies on so-called infill criteria (acquisition functions) to decide which point to evaluate next. When Kriging is used as the surrogate model of choice (also called Bayesian optimization), one of the most frequently chosen criteria is expected improvement. We argue that the popularity of expected improvement largely relies on its theoretical properties rather than empirically validated performance. Few results from the literature show evidence, that under certain conditions, expected improvement may perform worse than something as simple as the predicted value of the surrogate model. We benchmark both infill criteria in an extensive empirical study on the 'BBOB' function set. This investigation includes a detailed study of the impact of problem dimensionality on algorithm performance. The results support the hypothesis that exploration loses importance with increasing problem dimensionality. A statistical analysis reveals that the purely exploitative search with the predicted value criterion performs better on most problems of five or higher dimensions. Possible reasons for these results are discussed. In addition, we give an in-depth guide for choosing the infill criteria based on prior knowledge about the problem at hand, its dimensionality, and the available budget.

Model-based optimization with concept drifts

Model-based Optimization (MBO) is a method to optimize expensive black-box functions that uses a surrogate to guide the search. We propose two practical approaches that allow MBO to optimize black-box functions where the relation between input and output changes over time, which are known as dynamic optimization problems (DOPs). The window approach trains the surrogate only on the most recent observations, and the time-as-covariate approach includes the time as an additional input variable in the surrogate, giving it the ability to learn the effect of the time on the outcomes. We focus on problems where the change happens systematically and label this systematic change concept drift. To benchmark our methods we define a set of benchmark functions built from established synthetic static functions that are extended with controlled drifts. We evaluate how the proposed approaches handle scenarios of no drift, sudden drift and incremental drift. The results show that both new methods improve the performance if a drift is present. For higher-dimensional multimodal problems the window approach works best and on lower-dimensional problems, where it is easier for the surrogate to capture the influence of the time, the time-as-covariate approach works better.

An evolutionary optimization algorithm for gradually saturating objective functions

Evolutionary algorithms have been actively studied for dynamic optimization problems in the last two decades, however the research is mainly focused on problems with large, periodical or abrupt changes during the optimization. In contrast, this paper concentrates on gradually changing environments with an additional imposition of a saturating objective function. This work is motivated by an evolutionary neural architecture search methodology where a population of Convolutional Neural Networks (CNNs) is evaluated and iteratively modified using genetic operators during the training process. The objective of the search, namely the prediction accuracy of a CNN, is a continuous and slow moving target, increasing with each training epoch and eventually saturating when the training is nearly complete. Population diversity is an important consideration in dynamic environments wherein a large diversity restricts the algorithm from converging to a small area of the search space while the environment is still transforming. Our proposed algorithm adaptively influences the population diversity, depending on the rate of change of the objective function, using disruptive crossovers and non-elitist population replacements. We compare the results of our algorithm with a traditional evolutionary algorithm and demonstrate that the proposed modifications improve the algorithm performance in gradually saturating dynamic environments.

Sensitivity analysis in constrained evolutionary optimization

Sensitivity analysis deals with the question of how changes in input parameters of a model affect its outputs. For constrained optimization problems, one question may be how variations in budget or capacity constraints influence the optimal solution value. Although well established in the domain of linear programming, it is hardly addressed in evolutionary computation. In this paper, a general approach is proposed which allows to identify how the outcome of an evolutionary algorithm is affected when model parameters, such as constraints, are changed. Using evolutionary bilevel optimization in combination with data mining and visualization techniques, the recently suggested concept of bilevel innovization allows to find trade-offs among constraints and objective value. Additionally, it enables decision-makers to gain insights into the overall model behavior under changing framework conditions. The concept of bilevel innovization as a tool for sensitivity analysis is illustrated, without loss of generality, by the example of the multidimensional knapsack problem. The experimental results show that by applying bilevel innovization it is possible to determine how the solution values are influenced by changes of different constraints. Furthermore, rules were obtained that provide information on how parameters can be modified to achieve efficient trade-offs between constraints and objective value.

Integrated vs. sequential approaches for selecting and tuning CMA-ES variants

When faced with a specific optimization problem, deciding which algorithm to apply is always a difficult task. Not only is there a vast variety of algorithms to select from, but these algorithms are often controlled by many hyperparameters, which need to be suitably tuned in order to achieve peak performance. Usually, the problem of selecting and configuring the optimization algorithm is addressed sequentially, by first selecting a suitable algorithm and then tuning it for the application at hand. Integrated approaches, commonly known as Combined Algorithm Selection and Hyperparameter (CASH) solvers, have shown promise in several applications.

In this work we compare sequential and integrated approaches for selecting and tuning the best out of the 4,608 variants of the modular Covariance Matrix Adaptation Evolution Strategy (CMA-ES). We show that the ranking of these variants depends to a large extent on the quality of the hyperparameters. Sequential approaches are therefore likely to recommend sub-optimal choices. Integrated approaches, in contrast, manage to provide competitive results at much smaller computational cost. We also highlight important differences in the search behavior of two CASH approaches, which build on racing (irace) and on model-based optimization (MIP-EGO), respectively.

SESSION: Genetic programming

Multi-tree genetic programming for feature construction-based domain adaptation in symbolic regression with incomplete data

Nowadays, transfer learning has gained a rapid popularity in tasks with limited data available. While traditional learning limits the learning process to knowledge available in a specific (target) domain, transfer learning can use parts of knowledge extracted from learning in a different (source) domain to help learning in the target domain. This concept is of special importance when there is a lack of knowledge in the target domain. Consequently, since data incompleteness is a serious cause of knowledge shortage in real-world learning tasks, it can be typically addressed using transfer learning. One way to achieve that is feature construction-based domain adaptation. However, although it is considered as a powerful feature construction algorithm, Genetic Programming has not been fully utilized for domain adaptation. In this work, a multi-tree genetic programming method is proposed for feature construction-based domain adaptation. The main idea is to construct a transformation from the source feature space to the target feature space, which maps the source domain close to the target domain. This method is utilized for symbolic regression with missing values. The experimental work shows encouraging potential of the proposed approach when applied to real-world tasks considering different transfer learning scenarios.

Improving symbolic regression based on correlation between residuals and variables

In traditional regression analysis, a detailed examination of the residuals can provide an important way of validating the model quality. However, it has not been utilised in genetic programming based symbolic regression. This work aims to fill this gap and propose a new evaluation criterion of minimising the correlation between the residuals of regression models and the independent variables. Based on a recent association detection measure, maximal information coefficient which provides an accurate estimation of the correlation, the new evaluation measure is expected to enhance the generalisation of genetic programming by driving the evolutionary process towards models that are without unnecessary complexity and less likely learning from noise in data. The experiment results show that, compared with standard genetic programming which selects model based on the training error only and two state-of-the-art multiobjective genetic programming methods with mechanisms to prefer models with adequate structures, our new multiobjective genetic programming method minimising both the correlation between residuals and variable, and the training error has a consistently better generalisation performance and evolves simpler models.

A study on graph representations for genetic programming

Graph representations promise several desirable properties for Genetic Programming (GP); multiple-output programs, natural representations of code reuse and, in many cases, an innate mechanism for neutral drift. Each graph GP technique provides a program representation, genetic operators and overarching evolutionary algorithm. This makes it difficult to identify the individual causes of empirical differences, both between these methods and in comparison to traditional GP. In this work, we empirically study the behavior of Cartesian Genetic Programming (CGP), Linear Genetic Programming (LGP), Evolving Graphs by Graph Programming (EGGP) and traditional GP. By fixing some aspects of the configurations, we study the performance of each graph GP method and GP in combination with three different EAs: generational, steady-state and (1 + λ). In general, we find that the best choice of representation, genetic operator and evolutionary algorithm depends on the problem domain. Further, we find that graph GP methods, particularly in combination with the (1 + λ) EA are significantly better on digital circuit synthesis tasks.

Semantically-oriented mutation operator in cartesian genetic programming for evolutionary circuit design

Despite many successful applications, Cartesian Genetic Programming (CGP) suffers from limited scalability, especially when used for evolutionary circuit design. Considering the multiplier design problem, for example, the 5×5-bit multiplier represents the most complex circuit evolved from a randomly generated initial population. The efficiency of CGP highly depends on the performance of the point mutation operator, however, this operator is purely stochastic. This contrasts with the recent developments in Genetic Programming (GP), where advanced informed approaches such as semantic-aware operators are incorporated to improve the search space exploration capability of GP. In this paper, we propose a semantically-oriented mutation operator (SOMO) suitable for the evolutionary design of combinational circuits. SOMO uses semantics to determine the best value for each mutated gene. Compared to the common CGP and its variants as well as the recent versions of Semantic GP, the proposed method converges on common Boolean benchmarks substantially faster while keeping the phenotype size relatively small. The successfully evolved instances presented in this paper include 10-bit parity, 10+10-bit adder and 5×5-bit multiplier. The most complex circuits were evolved in less than one hour with a single-thread implementation running on a common CPU.

A modular memory framework for time series prediction

Tangled Program Graphs (TPG) is a framework for genetic programming which has shown promise in challenging reinforcement learning problems with discrete action spaces. The approach has recently been extended to incorporate temporal memory mechanisms that enable operation in environments with partial-observability at multiple timescales. Here we propose a highly-modular memory structure that manages temporal properties of a task and enables operation in problems with continuous action spaces. This significantly broadens the scope of real-world applications for TPGs, from continuous-action reinforcement learning to time series forecasting. We begin by testing the new algorithm on a suite of symbolic regression benchmarks. Next, we evaluate the method in 3 challenging time series forecasting problems. Results generally match the quality of state-of-the-art solutions in both domains. In the case of time series prediction, we show that temporal memory eliminates the need to pre-specify a fixed-size sliding window of previous values, or autoregressive state, which is used by all compared methods. This is significant because it implies that no prior model for a time series is necessary, and the forecaster may adapt more easily if the properties of a series change significantly over time.

Symbolic regression driven by training data and prior knowledge

In symbolic regression, the search for analytic models is typically driven purely by the prediction error observed on the training data samples. However, when the data samples do not sufficiently cover the input space, the prediction error does not provide sufficient guidance toward desired models. Standard symbolic regression techniques then yield models that are partially incorrect, for instance, in terms of their steady-state characteristics or local behavior. If these properties were considered already during the search process, more accurate and relevant models could be produced. We propose a multi-objective symbolic regression approach that is driven by both the training data and the prior knowledge of the properties the desired model should manifest. The properties given in the form of formal constraints are internally represented by a set of discrete data samples on which candidate models are exactly checked. The proposed approach was experimentally evaluated on three test problems with results clearly demonstrating its capability to evolve realistic models that fit the training data well while complying with the prior knowledge of the desired model characteristics at the same time. It outperforms standard symbolic regression by several orders of magnitude in terms of the mean squared deviation from a reference model.

Genetic programming approaches to learning fair classifiers

Society has come to rely on algorithms like classifiers for important decision making, giving rise to the need for ethical guarantees such as fairness. Fairness is typically defined by asking that some statistic of a classifier be approximately equal over protected groups within a population. In this paper, current approaches to fairness are discussed and used to motivate algorithmic proposals that incorporate fairness into genetic programming for classification. We propose two ideas. The first is to incorporate a fairness objective into multi-objective optimization. The second is to adapt lexicase selection to define cases dynamically over intersections of protected groups. We describe why lexicase selection is well suited to pressure models to perform well across the potentially infinitely many subgroups over which fairness is desired. We use a recent genetic programming approach to construct models on four datasets for which fairness constraints are necessary, and empirically compare performance to prior methods utilizing game-theoretic solutions. Methods are assessed based on their ability to generate trade-offs of subgroup fairness and accuracy that are Pareto optimal. The result show that genetic programming methods in general, and random search in particular, are well suited to this task.

Unlabeled multi-target regression with genetic programming

Machine Learning (ML) has now become an important and ubiquitous tool in science and engineering, with successful applications in many real-world domains. However, there are still areas in need of improvement, and problems that are still considered difficult with off-the-shelf methods. One such problem is Multi Target Regression (MTR), where the target variable is a multidimensional tuple instead of a scalar value. In this work, we propose a more difficult variant of this problem which we call Unlabeled MTR (uMTR), where the structure of the target space is not given as part of the training data. This version of the problem lies at the intersection of MTR and clustering, an unexplored problem type. Moreover, this work proposes a solution method for uMTR, a hybrid algorithm based on Genetic Programming and RANdom SAmple Consensus (RANSAC). Using a set of benchmark problems, we are able to show that this approach can effectively solve the uMTR problem.

Towards an evolutionary-based approach for natural language processing

Tasks related to Natural Language Processing (NLP) have recently been the focus of a large research endeavor by the machine learning community. The increased interest in this area is mainly due to the success of deep learning methods. Genetic Programming (GP), however, was not under the spotlight with respect to NLP tasks. Here, we propose a first proof-of-concept that combines GP with the well established NLP tool word2vec for the next word prediction task. The main idea is that, once words have been moved into a vector space, traditional GP operators can successfully work on vectors, thus producing meaningful words as the output. To assess the suitability of this approach, we perform an experimental evaluation on a set of existing newspaper headlines. Individuals resulting from this (pre-)training phase can be employed as the initial population in other NLP tasks, like sentence generation, which will be the focus of future investigations, possibly employing adversarial co-evolutionary approaches.

Code building genetic programming

In recent years the field of genetic programming has made significant advances towards automatic programming. Research and development of contemporary program synthesis methods, such as PushGP and Grammar Guided Genetic Programming, can produce programs that solve problems typically assigned in introductory academic settings. These problems focus on a narrow, predetermined set of simple data structures, basic control flow patterns, and primitive, non-overlapping data types (without, for example, inheritance or composite types). Few, if any, genetic programming methods for program synthesis have convincingly demonstrated the capability of synthesizing programs that use arbitrary data types, data structures, and specifications that are drawn from existing codebases. In this paper, we introduce Code Building Genetic Programming (CBGP) as a framework within which this can be done, by leveraging programming language features such as reflection and first-class specifications. CBGP produces a computational graph that can be executed or translated into source code of a host language. To demonstrate the novel capabilities of CBGP, we present results on new benchmarks that use non-primitive, polymorphic data types as well as some standard program synthesis benchmarks.

Adaptive weighted splines: a new representation to genetic programming for symbolic regression

Genetic Programming for Symbolic Regression is often prone to overfit the training data, resulting in poor generalization on unseen data. To address this issue, many pieces of research have been devoted to regularization via controlling the model complexity. However, due to the unstructured tree based representation of individuals the model complexity cannot be directly computed, rather approximation of the complexity must be taken. This paper proposes a new novel representation called Adaptive Weighted Splines which enables explicit control over the complexity of individuals using splines. The experimental results confirm that this new representation is significantly better than the tree-based representation at avoiding overfitting and generalizing on unseen data, demonstrating notably better and far more consistent generalization performances on all the benchmark problems. Further analysis also shows that in most cases, the new Genetic Programming method outperforms classical regression techniques such as Linear Regression, Support Vector Regression, K-Nearest Neighbour and Decision Tree Regression and performs competitively with state-of-the-art ensemble regression methods Random Forests and Gradient Boosting.

Constructing efficient multigrid solvers with genetic programming

For many linear and nonlinear systems that arise from the discretization of partial differential equations the construction of an efficient multigrid solver is a challenging task. Here we present a novel approach for the optimization of geometric multigrid methods that is based on evolutionary computation, a generic program optimization technique inspired by the principle of natural evolution. A multigrid solver is represented as a tree of mathematical expressions which we generate based on a tailored grammar. The quality of each solver is evaluated in terms of convergence and compute performance using automated local Fourier analysis (LFA) and roofline performance modeling, respectively. Based on these objectives a multi-objective optimization is performed using grammar-guided genetic programming with a non-dominated sorting based selection. To evaluate the model-based prediction and to target concrete applications, scalable implementations of an evolved solver can be automatically generated with the ExaStencils framework. We demonstrate the effectiveness of our approach by constructing multigrid solvers for a linear elastic boundary value problem that are competitive with common V- and W-cycles.

Novelty search for automatic bug repair

Genetic Improvement (GI) focuses on the development of evolutionary methods to automate software engineering tasks, such as performance improvement or software bugs removal. Concerning the latter, one of the earliest and most well-known methods in this area is the Genetic Program Repair (GenProg), a variant of Genetic Programming (GP). However, most GI systems encounter problems that are derived from the fact that they operate directly at source code level. These problems include highly neutral fitness landscapes and loss of diversity during the search, which are always undesirable in search and optimization tasks. This paper explores the use of Novelty Search (NS) with GenProg, since it can allow a search process to overcome these type of issues. While NS has been combined with GP before, and recently used with other GI systems, in the area of automatic bug repair NS has not been used until this work. Results show that GenProg with NS outperforms the original algorithm in some cases, based on an extensive experimental evaluation.

Synthesis through unification genetic programming

We present a new method, Synthesis through Unification Genetic Programming (STUN GP), which synthesizes provably correct programs using a Divide and Conquer approach. This method first splits the input space by undergoing a discovery phase that uses Counterexample-Driven Genetic Programming (CDGP) to identify a set of programs that are provably correct under unknown unification constraints. The STUN GP method then computes these restraints by synthesizing predicates with CDGP that strictly map inputs to programs where the output will be correct.

This method builds on previous work towards applying Genetic Programming (GP) to Syntax Guided Synthesis (SyGus) problems that seek to synthesize programs adhering to a formal specification rather than a fixed set of input-output examples. We show that our method is more scalable than previous CDGP variants, solving several benchmarks from the SyGus Competition that cannot be solved by CDGP. STUN GP significantly cuts into the gap between GP and state-of-the-art SyGus solvers and further demonstrates Genetic Programming's potential for Program Synthesis.

DAE-GP: denoising autoencoder LSTM networks as probabilistic models in estimation of distribution genetic programming

Estimation of distribution genetic programming (EDA-GP) algorithms are metaheuristics where sampling new solutions from a learned probabilistic model replaces the standard mutation and recombination operators of genetic programming (GP). This paper presents DAE-GP, a new EDA-GP which uses denoising autoencoder long short-term memory networks (DAE-LSTMs) as probabilistic model. DAE-LSTMs are artificial neural networks that first learn the properties of a parent population by mapping promising candidate solutions to a latent space and reconstructing the candidate solutions from the latent space. The trained model is then used to sample new offspring solutions. We show on a generalization of the royal tree problem that DAE-GP outperforms standard GP and that performance differences increase with higher problem complexity. Furthermore, DAE-GP is able to create offspring with higher fitness from a learned model in comparison to standard GP. We believe that the key reason for the high performance of DAE-GP is that we do not impose any assumptions about the relationships between learned variables which is different to previous EDA-GP models. Instead, DAE-GP flexibly identifies and models relevant dependencies of promising candidate solutions.

SESSION: Real world applications

Towards sustainable forest management strategies with MOEAs

Sustainable forest management is a crucial element in combating climate change, plastic pollution, and other unsolved challenges of the 21st century. Forests not only produce wood - a renewable resource that is increasingly replacing fossil-based materials - but also preserve biodiversity and store massive amounts of carbon. Thus, a truly optimal forest policy has to balance profit-oriented logging with ecological and societal interests, and should thus be solved as a multi-objective optimization problem. Economic forest research, however, has largely focused on profit maximization. Recent publications still scalarize the problem a priori by assigning weights to objectives. In this paper, we formulate a multi-objective forest management problem where profit, carbon storage, and biodiversity are maximized. We obtain Pareto-efficient forest management strategies by utilizing three state-of-the-art Multi-Objective Evolutionary Algorithms (MOEAs), and by incorporating domain-specific knowledge through customized evolutionary operators. An analysis of Pareto-efficient strategies and their harvesting schedules in the design space clearly shows the benefits of the proposed approach. Unlike many EMO application studies, we demonstrate how a systematic post-optimality trade-off analysis can be applied to choose a single preferred solution. Our pioneering work on sustainable forest management explores an entirely new application area for MOEAs with great societal impact.

Simultaneously searching and solving multiple avoidable collisions for testing autonomous driving systems

The oracle problem is a key issue in testing Autonomous Driving Systems (ADS): when a collision is found, it is not always clear whether the ADS is responsible for it. Our recent search-based testing approach offers a solution to this problem by defining a collision as avoidable if a differently configured ADS would have avoided it. This approach searches for both collision scenarios and the ADS configurations capable of avoiding them. However, its main problem is that the ADS configurations generated for avoiding some collisions are not suitable for preventing other ones. Therefore, it does not provide any guidance to automotive engineers for improving the safety of the ADS. To this end, we propose a new search-based approach to generate configurations of the ADS that can avoid as many different types of collisions as possible. We present two versions of the approach, which differ in the way of searching for collisions and alternative configurations. The approaches have been experimented on the path planner component of an ADS provided by our industry partner.

One property to rule them all?: on the limits of trade-offs for S-boxes

Substitution boxes (S-boxes) are nonlinear mappings that represent one of the core parts of many cryptographic algorithms (ciphers). If S-box does not possess good properties, a cipher would be susceptible to attacks. To design suitable S-boxes, we can use heuristics as it allows significant freedom in the selection of required cryptographic properties. Unfortunately, with heuristics, one is seldom sure how good a trade-off between cryptographic properties is reached or if optimizing for one property optimizes implicitly for another property. In this paper, we consider what is to the best of our knowledge, the most detailed analysis of trade-offs among S-box cryptographic properties. More precisely, we ask questions if one property is optimized, what is the worst possible value for some other property, and what happens if all properties are optimized. Our results show that while it is possible to reach a large variety of possible solutions, optimizing for a certain property would commonly result in good values for other properties. In turn, this suggests that a single-objective approach should be a method of choice unless some precise values for multiple properties are needed.

Surrogate-assisted asynchronous multiobjective algorithm for nuclear power plant operations

In the context of the introduction of renewable energies in France, Nuclear Power Plant Operations (NPPO) are a key component for the compensation of the intermittent production of solar and wind power. In this work, we focus on the optimization of the operation cost and stability of power of a real-life power transient, while maintaining safety standards. From an optimization point of view, the NPPO problem is a typical example of a discrete constrained bi-objective problem based on time expensive computation simulation. We propose a massive asynchronous parallel master/workers MOEA/D assisted by a surrogate models. The algorithm design components are discussed and argued in this work. We show that our proposed surrogate assistance is able to improve algorithm performance and reliability, allowing us to extend our approach to a large range of strategic future real-life operations.

Multi-objective optimal distribution of materials in hybrid components

Genetic algorithms are 0th-order methods and therefore praised in many non-differentiable optimization problems, which encompass the majority of real world applications. In this work, a multiobjective optimization of hybrid, i.e. multi-material, components under technological constraints is performed to guide engineers towards the optimal design of manufactured parts in competition-driven industries, like in the automotive or the aerospace sector. Specifically, three of the main challenges Original Equipment Manufacturers (OEMs) face nowadays are met : simultaneously minimizing compliance, weight and cost. This is achieved by replacing pure metallic components with lightweight materials such as thermoplastics and composites. However, a mere substitution would not be appropriate because it would usually result in insufficient performances or expensive designs. The task of the genetic algorithm is hence to find the optimal material distribution using Voronoï tessellations on a fixed Finite Element (FE) mesh while complying with the manufacturing methods of thermoplastics and composites. The Voronoï encoding has a great advantage over traditional Bit-Array genotypes : its size is independent of the FE mesh granularity, therefore refining the mesh has no impact on the computational cost of the genetic algorithm's operators. Experimental results on the cantilever beam test case show Pareto optimal material distributions.

Modeling wildfire using evolutionary cellular automata

With the increased size and frequency of wildfire events worldwide, accurate real-time prediction of evolving wildfire fronts is a crucial component of firefighting efforts and forest management practices. We propose a cellular automaton (CA) that simulates the spread of wildfire. We embed the CA inside of a genetic program (GP) that learns the state transition rules from spatially registered synthetic wildfire data. We demonstrate this model's predictive abilities by testing it on unseen synthetically generated landscapes. We compare the performance of a genetic program (GP) based on a set of primitive operators and restricted expression length to null and logistic models. We find that the GP is able to closely replicate the spreading behavior driven by a balanced logistic model. Our method is a potential alternative to current benchmark physics-based models.

Non-deterministic journey planning in multi-modal transportation networks: a meta-heuristic approach

Multi-modal journey planning, which allows multiple modes of transport to be used within a single trip, is becoming increasingly popular, due to a vast practical interest and the increasing availability of data. In real-life situations, transport networks often involve uncertainty, and yet, most approaches assume a deterministic environment, making plans more prone to failures such as significant delays in the arrival or waiting for a long time at stations. In this paper, we tackle the multi-criteria stochastic journey planning problem in multi-modal transportation networks. We consider the problem as a probabilistic conditional planning problem, and we use Markov Decision Processes to model the problem. Journey plans are optimised simultaneously against five criteria, namely: travel time, journey convenience, monetary cost, CO2, and personal energy expenditure (PEE). We develop an NSGA-III-based solver as a baseline search method for producing optimal policies for travelling from a given origin to a given destination. Our empirical evaluation uses Melbourne transportation network using probabilistic density functions for estimated departure/arrival time of the trips. Numerical results demonstrate the effectiveness of the proposed method for practical purposes and provide strong evidence in favour of contingency planning for journey planning problem.

A memetic level-based learning swarm optimizer for large-scale water distribution network optimization

Potable water distribution networks are requisites of modern cities. Because of the city expansion, nowadays, the scale of the network grows rapidly, which brings great difficulty to its optimization. Evolutionary computation methods have been widely investigated on small-scale networks, but their performance is far from satisfactory on large-scale networks. Aimed at addressing this difficulty, a new memetic algorithm called level-based learning swarm optimizer with restart and local search is proposed in this paper to solve the large-scale water distribution network optimization problem. Instead of using traditional evolutionary computation algorithms, the level-based learning swarm optimizer that is especially proposed for large-scale optimization problems is applied as the population-based optimizer. Two restart strategies are incorporated to make the algorithm more effective. They can help the algorithm jump out from local optima thus to increase its exploration ability. Moreover, a simple yet effective local search algorithm is proposed based on the domain knowledge to further refine the solutions after the algorithm converges. Experimental results on both single-source and multi-source large-scale water distribution networks show that the proposed algorithm is more effective than the state-of-the-art evolutionary computation algorithms.

Adaptive augmented evolutionary intelligence for the design of water distribution networks

The application of Evolutionary Algorithms (EAs) to real-world problems comes with inherent challenges, primarily the difficulty in defining the large number of considerations needed when designing complex systems such as Water Distribution Networks (WDN). One solution is to use an Interactive Evolutionary Algorithm (IEA), which integrates a human expert into the optimisation process and helps guide it to solutions more suited to real-world application. The involvement of an expert provides the algorithm with valuable domain knowledge; however, it is an intensive task requiring extensive interaction, leading to user fatigue and reduced effectiveness. To address this, the authors have developed methods for capturing human expertise from user interactions utilising machine learning to produce Human-Derived Heuristics (HDH) which are integrated into an EA's mutation operator. This work focuses on the development of an adaptive method for applying multiple HDHs throughout an EA's search. The new adaptive approach is shown to outperform both singular HDH approaches and traditional EAs on a range of large scale WDN design problems. This work paves the way for the development of a new type of IEA that has the capability of learning from human experts whilst minimising user fatigue.

Evolutionary bin packing for memory-efficient dataflow inference acceleration on FPGA

Convolutional Neural Network (CNN) dataflow inference accelerators implemented in Field-Programmable Gate Arrays (FPGAs) have demonstrated increased energy efficiency and lower latency compared to CNN execution on CPUs or GPUs. However, the complex shapes of CNN parameter memories do not typically map well to FPGA On-Chip Memories (OCM), which results in poor OCM utilization and ultimately limits the size and types of CNNs which can be effectively accelerated on FPGAs. In this work, we present a design methodology that improves the mapping efficiency of CNN parameters to FPGA OCM. We frame the mapping as a bin packing problem and determine that traditional bin packing algorithms are not well suited to solve the problem within FPGA- and CNN-specific constraints. We hybridize genetic algorithms and simulated annealing with traditional bin packing heuristics to create flexible mappers capable of grouping parameter memories such that each group optimally fits FPGA on-chip memories. We evaluate these algorithms on a variety of FPGA inference accelerators. Our hybrid mappers converge to optimal solutions in a matter of seconds for all CNN use-cases, achieve an increase of up to 65% in OCM utilization efficiency for deep CNNs, and are up to 200× faster than current state-of-the-art simulated annealing approaches.

Impact of NSGA-II objectives on EEG feature selection related to motor imagery

The selection of ElectroEncephaloGram (EEG) features with functional relevance to Motor Imagery (MI) is a crucial task for successful outcome in Brain-Computer Interface (BCI)-based motor rehabilitation. Individual EEG patterns during MI requires subject-dependent feature selection, which is an arduous task due to the complexity and large number of features. One solution is to use metaheuristics, e.g. Genetic Algorithm (GA), to avoid an exhaustive search which is impractical. In this work, one of the most widely used GA, NSGA-II, is used with an hierarchical individual representation to facilitate the exclusion of EEG channels irrelevant for MI. In essence, the performance of different objectives in NSGA-II was evaluated on a previously recorded MI EEG data set. Empirical results show that k-Nearest Neighbors (k-NN) combined with Pearson's Correlation (PCFS) as objective functions yielded higher classification accuracy as compared to the other objective-combinations (73% vs. 69%). Linear Discriminant Analysis (LDA) combined with Feature Reduction (FR) as objective functions maximized the reduction of features (99.6%) but reduced classification performance (65.6%). All classifier objectives combined with PCFS selected similar features in accordance with expected activity patterns during MI. In conclusion, PCFS and a classifier as objective functions constitutes a good trade-off solution for MI data.

Evolving energy demand estimation models over macroeconomic indicators

Energy is essential for all countries, since it is in the core of social and economic development. Since the industrial revolution, the demand for energy has increased exponentially. It is expected that the energy consumption in the world increases by 50% by 2030 [17]. As such, managing the demand of energy is of the uttermost importance. The development of tools to model and accurately predict the demand of energy is very important to policy makers. In this paper we propose the use of the Structured Grammatical Evolution (SGE) algorithm to evolve models of energy demand, over macro-economic indicators. The proposed SGE is hybridised with a Differential Evolution approach in order to obtain the parameters of the models evolved which better fit the real energy demand. We have tested the performance of the proposed approach in a problem of total energy demand estimation in Spain, where we show that the SGE is able to generate extremely accurate and robust models for the energy prediction within one year time-horizon.

Optimisation of large wave farms using a multi-strategy evolutionary framework

Wave energy is a fast-developing and promising renewable energy resource. The primary goal of this research is to maximise the total harnessed power of a large wave farm consisting of fully-submerged three-tether wave energy converters (WECs). Energy maximisation for large farms is a challenging search problem due to the costly calculations of the hydrodynamic interactions between WECs in a large wave farm and the high dimensionality of the search space. To address this problem, we propose a new hybrid multi-strategy evolutionary framework combining smart initialisation, binary population-based evolutionary algorithm, discrete local search and continuous global optimisation. For assessing the performance of the proposed hybrid method, we compare it with a wide variety of state-of-the-art optimisation approaches, including six continuous evolutionary algorithms, four discrete search techniques and three hybrid optimisation methods. The results show that the proposed method performs considerably better in terms of convergence speed and farm output.

Hybrid genetic algorithm for ridesharing with timing constraints: efficiency analysis with real-world data

Ridesharing is an effective, affordable and environment-friendly way of transportation. This paper proposes a Hybrid Genetic Algorithm(HGA) using elements from Simulated Annealing(SA) and Local search(LS) which is very suitable for Ridesharing related applications. The designed algorithm efficiently handles advanced constraints like timing windows for pick-up, detour time(or distance), and waiting-time minimization.

To quantify the effectiveness of HGA for the Ridesharing problem, we study Ridesharing as a close variant of the Orienteering Problem(OP) and Orienteering Problem with Time Windows(OPTW). Both these problems have been extensively studied and have many benchmark data sets. The experiments performed on popular benchmark instances of orienteering problems show that HGA for orienteering problems outperforms all the available algorithms in the literature. Further, for many of the instances, our HGA has discovered solutions that are better than the current Best Known solutions. Using our HGA for orienteering problems, we have developed a prototype system for Ridesharing. The experiments using real-world New York taxi trip data [19] shows that the system ensures that about 40%-50% of the rides could have been shared, and 60%-70% of customers could have shared rides. The detailed data analysis shows how to tune the parameters for the system to achieve the best possible performance.

Automated design of multi-level network partitioning heuristics employing self-adaptive primitive granularity control

Network segmentation has a variety of applications, including computer network security. A well segmented computer network is less likely to result in information leaks and more resilient to adversarial traversal. Conventionally network segmentation approaches rely on graph partitioning algorithms. However, general-purpose graph partitioning solutions are just that, general purpose. These approaches do not exploit specific topological characteristics present in certain classes of networks. Tailored partition methods can be developed to target specific domains, but this process can be time consuming and difficult. This work builds on previous research employing generative hyper-heuristics in the form of genetic programming for automating the development of customized graph partitioning heuristics by incorporating a dynamic approach to controlling the granularity of the heuristic search. The potential of this approach is demonstrated using two real-world complex network applications. Results show that the automated design process is capable of fine tuning graph partitioning heuristics that sacrifice generality for improved performance on targeted networks.

Variable reduction for surrogate-based optimization

Real-world problems such as computational fluid dynamics simulations and finite element analyses are computationally expensive. A standard approach to mitigating the high computational expense is Surrogate-Based Optimization (SBO). Yet, due to the high-dimensionality of many simulation problems, SBO is not directly applicable or not efficient. Reducing the dimensionality of the search space is one method to overcome this limitation. In addition to the applicability of SBO, dimensionality reduction enables easier data handling and improved data and model interpretability. Regularization is considered as one state-of-the-art technique for dimensionality reduction. We propose a hybridization approach called Regularized-Surrogate-Optimization (RSO) aimed at overcoming difficulties related to high-dimensionality. It couples standard Kriging-based SBO with regularization techniques. The employed regularization methods are based on three adaptations of the least absolute shrinkage and selection operator (LASSO). In addition, tree-based methods are analyzed as an alternative variable selection method. An extensive study is performed on a set of artificial test functions and two real-world applications: the electrostatic precipitator problem and a multilayered composite design problem. Experiments reveal that RSO requires significantly less time than standard SBO to obtain comparable results. The pros and cons of the RSO approach are discussed, and recommendations for practitioners are presented.

A genetic programming approach to feature construction for ensemble learning in skin cancer detection

Ensembles of classifiers have proved to be more effective than a single classification algorithm in skin image classification problems. Generally, the ensembles are created using the whole set of original features. However, some original features can be redundant and may not provide useful information in building good ensemble classifiers. To deal with this, existing feature construction methods that usually generate new features for only a single classifier have been developed but they fit the training data too well, resulting in poor test performance. This study develops a new classification method that combines feature construction and ensemble learning using genetic programming (GP) to address the above limitations. The proposed method is evaluated on two benchmark real-world skin image datasets. The experimental results reveal that the proposed algorithm has significantly outperformed two existing GP approaches, two state-of-the-art convolutional neural network methods, and ten commonly used machine learning algorithms. The evolved individual that is considered as a set of constructed features helps identify prominent original features which can assist dermatologists in making a diagnosis.

COUGAR: clustering of unknown malware using genetic algorithm routines

Through malware, cyber criminals can leverage our computing resources to disrupt our work, steal our information, and even hold it hostage. Security professionals seek to classify these malicious software so as to prevent their distribution and execution, but the sheer volume of malware complicates these efforts. In response, machine learning algorithms are actively employed to alleviate the workload. One such approach is evolutionary computation, where solutions are bred, rather than built. In this paper, we design, develop and evaluate a system, COUGAR, to reduce high-dimensional malware behavioural data, and optimize clustering behaviour using a multi-objective genetic algorithm. Evaluations demonstrate that each of our chosen clustering algorithms can successfully highlight groups of malware. We also present an example real-world scenario, based on the testing data, to demonstrate practical applications.

SESSION: Search-based software engineering

Causes and effects of fitness landscapes in unit test generation

Search-based unit test generation applies evolutionary search to maximize code coverage. Although the performance of this approach is often good, sometimes it is not, and how the fitness landscape affects this performance is poorly understood. This paper presents a thorough analysis of 331 Java classes by (i) characterizing their fitness landscape using six established fitness landscape measures, (ii) analyzing the impact of these fitness landscape measures on the search, and (iii) investigating the underlying properties of the source code influencing these measures. Our results reveal that classical indicators for rugged fitness landscapes suggest well searchable problems in the case of unit test generation, but the fitness landscape for most problem instances is dominated by detrimental plateaus. A closer look at the underlying source code suggests that these plateaus are frequently caused by code in private methods, methods throwing exceptions, and boolean flags. This suggests that inter-procedural distance metrics and testability transformations could improve search-based test generation.

Scalability analysis of grammatical evolution based test data generation

Heuristic-based search techniques have been increasingly used to automate different aspects of software testing. Several studies suggest that variable interdependencies may exist in branching conditions of real-life programs, and these dependencies result in the need for highly precise data values (such as of the form i=j=k) for code coverage analysis. This requirement makes it very difficult for Genetic Algorithm (GA)-based approach to successfully search for the required test data from vast search spaces of real-life programs.

Ariadne is the only Grammatical Evolution (GE)-based test data generation system, proposed to date, that uses grammars to exploit variable interdependencies to improve code coverage. Ariadne has been compared favourably to other well-known test data generation techniques in the literature; however, its scalability has not yet been tested for increasingly complex programs.

This paper presents the results of a rigorous analysis performed to examine Ariadne's scalability. We also designed and employed a large set of highly scalable 18 benchmark programs for our experiments. Our results suggest that Ariadne is highly scalable as it exhibited 100% coverage across all the programs of increasing complexity with significantly smaller search costs than GA-based approaches, which failed even with huge search budgets.

Seeding strategies for multi-objective test case selection: an application on simulation-based testing

The time it takes software systems to be tested is usually long. This is often caused by the time it takes the entire test suite to be executed. To optimize this, regression test selection approaches have allowed for improvements to the cost-effectiveness of verification and validation activities in the software industry. In this area, multi-objective algorithms have played a key role in selecting the appropriate subset of test cases from the entire test suite. In this paper, we propose a set of seeding strategies for the test case selection problem that generate the initial population of multi-objective algorithms. We integrated these seeding strategies with an NSGA-II algorithm for solving the test case selection problem in the context of simulation-based testing. We evaluated the strategies with six case studies and a total of 21 fitness combinations for each case study (i.e., a total of 126 problems). Our evaluation suggests that these strategies are indeed helpful for solving the multi-objective test case selection problem. In fact, two of the proposed seeding strategies outperformed the NSGA-II algorithm without seeding population with statistical significance for 92.8 and 96% of the problems.

Towards rigorous validation of energy optimisation experiments

The optimisation of software energy consumption is of growing importance across all scales of modern computing, i.e., from embedded systems to data-centres. Practitioners in the field of Search-Based Software Engineering and Genetic Improvement of Software acknowledge that optimising software energy consumption is difficult due to noisy and expensive fitness evaluations. However, it is apparent from results to date that more progress needs to be made in rigorously validating optimisation results. This problem is pressing because modern computing platforms have highly complex and variable behaviour with respect to energy consumption. To compare solutions fairly we propose in this paper a new validation approach called R3-validation which exercises software variants in a rotated-round-robin order. Using a case study, we present an in-depth analysis of the impacts of changing system states on software energy usage, and we show how R3-validation mitigates these. We compare it with current validation approaches across multiple devices and operating systems, and we show that it aligns best with actual platform behaviour.

Genetic algorithms for redundancy in interaction testing

It is imperative for testing to determine if the components within large-scale software systems operate functionally. Interaction testing involves designing a suite of tests, which guarantees to detect a fault if one exists among a small number of components interacting together. The cost of this testing is typically modeled by the number of tests, and thus much effort has been taken in reducing this number. Here, we incorporate redundancy into the model, which allows for testing in non-deterministic environments. Existing algorithms for constructing these test suites usually involve one "fast" algorithm for generating most of the tests, and another "slower" algorithm to "complete" the test suite. We employ a genetic algorithm that generalizes these approaches that also incorporates redundancy by increasing the number of algorithms chosen, which we call "stages." By increasing the number of stages, we show that not only can the number of tests be reduced compared to existing techniques, but the computational time in generating them is also greatly reduced.

Enhancing search-based product line design with crossover operators

The Product Line Architecture (PLA) is one of the most important artifacts of a Software Product Line. PLA designing has been formulated as a multi-objective optimization problem and successfully solved by a state-of-the-art search-based approach. However, the majority of empirical studies optimize PLA designs without applying one of the fundamental genetic operators: the crossover. An operator for PLA design, named Feature-driven Crossover, was proposed in a previous study. In spite of the promising results, this operator occasionally generated incomplete solutions. To overcome these limitations, this paper aims to enhance the search-based PLA design optimization by improving the Feature-driven Crossover and introducing a novel crossover operator specific for PLA design. The proposed operators were evaluated in two well-studied PLA designs, using three experimental configurations of NSGA-II in comparison with a baseline that uses only mutation operators. Empirical results show the usefulness and efficiency of the presented operators on reaching consistent solutions. We also observed that the two operators complement each other, leading to PLA design solutions with better feature modularization than the baseline experiment.

SESSION: Theory

The (1 + (λ,λ)) GA is even faster on multimodal problems

For the (1 + (λ, λ)) genetic algorithm rigorous runtime analyses on unimodal fitness functions have shown that it can be faster than classical evolutionary algorithms, though on these simple problems the gains are only moderate.

In this work, we conduct the first runtime analysis of this algorithm on a multimodal problem class, the jump functions benchmark. We show that with the right parameters, the (1 + (λ, λ)) GA optimizes any jump function with jump size 2 ≤ kn/16 in expected time O(n(k+1)/2 eO(k) k-k/2), which significantly and already for constant k outperforms standard mutation-based algorithms with their Θ(nk) runtime and standard crossover-based algorithms with their O(nk-1) runtime.

Our work also suggests some general advice on how to set the parameters of the (1 + (λ,λ)) GA, which might ease the further use of this algorithm.

Fast mutation in crossover-based algorithms

The heavy-tailed mutation operator proposed in Doerr et al. (GECCO 2017), called fast mutation to agree with the previously used language, so far was successfully used only in purely mutation-based algorithms. There, it can relieve the algorithm designer from finding the optimal mutation rate and nevertheless obtain a performance close to the one that the optimal mutation rate gives.

In this first runtime analysis of a crossover-based algorithm using a heavy-tailed choice of the mutation rate, we show an even stronger impact. With a heavy-tailed mutation rate, the runtime of the (1 + (λ, λ)) genetic algorithm on the OneMax benchmark function becomes linear in the problem size. This is asymptotically faster than with any static mutation rate and is the same asymptotic runtime that can be obtained with a self-adjusting choice of the mutation rate. This result is complemented by an empirical study which shows the effectiveness of the fast mutation also on random MAX-3SAT instances.

More effective randomized search heuristics for graph coloring through dynamic optimization

Dynamic optimization problems have gained significant attention in evolutionary computation as evolutionary algorithms (EAs) can easily adapt to changing environments. We show that EAs can solve the graph coloring problem for bipartite graphs more efficiently by using dynamic optimization. In our approach the graph instance is given incrementally such that the EA can reoptimize its coloring when a new edge introduces a conflict. We show that, when edges are inserted in a way that preserves graph connectivity, Randomized Local Search (RLS) efficiently finds a proper 2-coloring for all bipartite graphs. This includes graphs for which RLS and other EAs need exponential expected time in a static optimization scenario. We investigate different ways of building up the graph by popular graph traversals such as breadth-first-search and depth-first-search and analyse the resulting runtime behavior. We further show that offspring populations (e. g. a (1 + λ) RLS) lead to an exponential speedup in λ. Finally, an island model using 3 islands succeeds in an optimal time of Θ(m) on every m-edge bipartite graph, outperforming offspring populations. This is the first example where an island model guarantees a speedup that is not bounded in the number of islands.

The node weight dependent traveling salesperson problem: approximation algorithms and randomized search heuristics

Several important optimization problems in the area of vehicle routing can be seen as variants of the classical Traveling Salesperson Problem (TSP). In the area of evolutionary computation, the Traveling Thief Problem (TTP) has gained increasing interest over the last 5 years. In this paper, we investigate the effect of weights on such problems, in the sense that the cost of traveling increases with respect to the weights of nodes already visited during a tour. This provides abstractions of important TSP variants such as the Traveling Thief Problem and time dependent TSP variants, and allows to study precisely the increase in difficulty caused by weight dependence. We provide a 3.59-approximation for this weight dependent version of TSP with metric distances and bounded positive weights. Furthermore, we conduct experimental investigations for simple randomized local search with classical mutation operators and two variants of the state-of-the-art evolutionary algorithm EAX adapted to the weighted TSP. Our results show the impact of the node weights on the position of the nodes in the resulting tour.

Fixed-target runtime analysis

Runtime analysis aims at contributing to our understanding of evolutionary algorithms through mathematical analyses of their runtimes. In the context of discrete optimization problems, runtime analysis classically studies the time needed to find an optimal solution. However, both from a practical and a theoretical viewpoint, more fine-grained performance measures are needed. Two complementary approaches have been suggested: fixed-budget analysis and fixed-target analysis.

In this work, we conduct an in-depth study on the advantages and limitations of fixed-target analyses. We show that, different from fixed-budget analyses, many classical methods from the runtime analysis of discrete evolutionary algorithms yield fixed-target results without greater effort. We use this to conduct a number of new fixed-target analyses. However, we also point out examples where an extension of the existing runtime result to a fixed-target result is highly non-trivial.

Does comma selection help to cope with local optima?

One hope of using non-elitism in evolutionary computation is that it aids leaving local optima. We perform a rigorous runtime analysis of a basic non-elitist evolutionary algorithm (EA), the (μ, λ) EA, on the most basic benchmark function with a local optimum, the jump function. We prove that for all reasonable values of the parameters and the problem, the expected runtime of the (μ, λ) EA is, apart from lower order terms, at least as large as the expected runtime of its elitist counterpart, the (μ + λ) EA (for which we conduct the first runtime analysis to allow this comparison). Consequently, the ability of the (μ, λ) EA to leave local optima to inferior solutions does not lead to a runtime advantage.

We complement this lower bound with an upper bound that, for broad ranges of the parameters, is identical to our lower bound apart from lower order terms. This is the first runtime result for a non-elitist algorithm on a multi-modal problem that is tight apart from lower order terms.

Self-adjusting evolutionary algorithms for multimodal optimization

Recent theoretical research has shown that self-adjusting and self-adaptive mechanisms can provably outperform static settings in evolutionary algorithms for binary search spaces. However, the vast majority of these studies focuses on unimodal functions which do not require the algorithm to flip several bits simultaneously to make progress. In fact, existing self-adjusting algorithms are not designed to detect local optima and do not have any obvious benefit to cross large Hamming gaps.

We suggest a mechanism called stagnation detection that can be added as a module to existing evolutionary algorithms (both with and without prior self-adjusting schemes). Added to a simple (1 + 1) EA, we prove an expected runtime on the well-known Jump benchmark that corresponds to an asymptotically optimal parameter setting and outperforms other mechanisms for multimodal optimization like heavy-tailed mutation. We also investigate the module in the context of a self-adjusting (1 + λ) EA and show that it combines the previous benefits of this algorithm on unimodal problems with more efficient multimodal optimization. To explore the limitations of the approach, we additionally present an example where both self-adjusting mechanisms, including stagnation detection, do not help to find a beneficial setting of the mutation rate. Finally, we investigate our module for stagnation detection experimentally.

A tight lower bound on the expected runtime of standard steady state genetic algorithms

Recent progress in the runtime analysis of evolutionary algorithms (EAs) has allowed the derivation of upper bounds on the expected runtime of standard steady-state GAs. These upper bounds have shown speed-ups of the GAs using crossover and mutation over the same algorithms that only use mutation operators (i.e., steady-state EAs) both for standard unimodal (i.e., OneMax) and multimodal (i.e., Jump) benchmark functions. These upper bounds suggest that populations are beneficial to the GA as well as higher mutation rates than the default 1/n rate. However, making rigorous claims was not possible because matching lower bounds were not available. Proving lower bounds on crossover-based EAs is a notoriously difficult task as it is hard to capture the progress that a diverse population can make. We use a potential function approach to prove a tight lower bound on the expected runtime of the (2 + 1) GA for OneMax for all mutation rates c/n with c < 1.422. This provides the last piece of the puzzle that completes the proof that larger population sizes improve the performance of the standard steady-state GA for OneMax for various mutation rates, and it proves that the optimal mutation rate for the (2 + 1) GA on OneMax is [EQUATION].