Differential evolution from the ground up in Python
Differential evolution is a heuristic strategy for the global optimization of nonlinear and non-differentiable continuous space functions.
The differential evolution algorithm comes from a wider family of evolutionary computing algorithms. Like other popular direct search strategies, like genetic algorithms and evolution techniques, the differential evolution algorithm begins with an initial population of candidate solutions. These candidate solutions are iteratively enhanced by putting forth mutations into the population, and retaining the fittest candidate solutions that provide a lower objective function value.
The differential evolution algorithm is beneficial over the prior mentioned popular strategies as it can manage nonlinear and non-differentiable multi-dimensional objective functions, while needing very minimal control parameters to direct the minimization. These attributes make the algorithm simpler and more practical to leverage.
In this guide, you will find out about the differential evolution algorithm for global optimization.
After going through this guide, you will be aware of:
- Differential evolution is a heuristic strategy for the global optimization of nonlinear and non-differentiable continuous space functions.
- How to go about implementing the differential evolution algorithm from the ground up in Python.
- How to go about applying the differential evolution algorithm to a real-valued 2D objective function.
Tutorial Summarization
This tutorial is subdivided into three portions, they are:
- Differential Evolution
- Differential Evolution Algorithm from the ground up
- Differential Evolution Algorithm on the Sphere Function
Differential Evolution
Differential evolution is a heuristic strategy for the global optimization of non-linear and non-differentiable continuous space functions.
For a minimization algorithm to be viewed as practical, it is expected to satisfy five differing requirements:
- Capability to manage non-differentiable, nonlinear and multimodal cost functions
- Parallelizability to handle computationally intensive cost functions
- Simplicity of use, that is minimal control variables to direct the minimization. These variable ought to be robust and simple to choose
- Good convergence attributes, that is, consistent convergence to the global minimum in consecutive independent trials
The veracity of the differential evolution algorithm props up from the fact that it was developed to satisfy all of the above requirements.
Differential Evolution (DE) is debatably one of the most capable and versatile evolutionary optimizers for the continuous parameter spaces in recent times.
The algorithm starts by arbitrarily initiating a population of real-valued decision vectors, also referred to as genomes or chromosomes. These signify the candidate’s solutions to the multi-dimensional optimization issue.
At every iteration, the algorithm puts forth mutations into the population to produce new candidate solutions. The mutation process adds the weighted difference amongst two population vectors to a third vector, to a generate a mutated vector. The parameters of the mutated vector are again mingled with the parameters of another predetermined vector, the target vector, in a procedure referred to as crossover that intends to increase the diversity of the perturbed parameter vectors. The outcome vector is referred to as the trial vector.
DE produces new parameter vectors by including the weighted difference between two population vectors to a third vector. Let this operation be referred to as mutation. To enhance the diversity of the perturbed parameter vectors, crossover is introduced.
These mutations are produced according to a mutation technique, which adheres to a general naming convention of DE/x/y/z, where DE stands for Differential Evolution, whereas x signifies the vector to be mutated, y signifies the number of difference vectors taken up for the mutation of x, and z is the variant of crossover in use. For example, the popular techniques:
- DE/rand/1/bin
- DE/best/2/bin
Mention that vector x can either be chosen arbitrarily (rand) from the population, or else the vector with the minimum cost (best) is chosen, that the number of difference vectors being considered is either 1 or 2; and that crossover is executed according to independent binomial (bin) experiments. The DE/best/2/bin technique, specifically, seems to be highly advantageous in enhancing the diversity of the population if the population size is big enough.
The leveraging of two difference vectors appears to enhance the diversity of the population if the number of population vectors NP is high enough.
A last selection operation substitutes the targeted vector, or the parent, by the trial vector, its offspring, if the latter yields a lower objective function value. Therefore, the fitter offspring currently becomes a member of the newly generated populace, and subsequently takes part in the mutation of further population members. These iterations persist until a termination criterion is reached.
The differential evolution algorithm needs very minimal parameters to function, specifically the population size, NP, a real and constant scale factor, F ∈ [0, 2] that weights the differential variation in the mutation process, and a crossover rate, CR ∈ [0, 1], that is decided experimentally. This makes the algorithm simple and practical to leverage.
Also, the canonical DE needs very minimal control parameters (3 to be exact: the scale factor, the crossover rate, and the population size) – a feature that makes it simple to leverage for the practitioners.
There have been subsequent variants to the canonical differential evolution detailed above.
Now that we are acquainted with the differential evolution algorithm, let’s observe how to implement it from the ground up.
Differential Evolution Algorithm from Scratch
In this section, we will look into how to implement the differential evolution algorithm from the ground up.
The differential evolution algorithm starts by producing an initial population of candidate solutions. For this reason, we shall leverage the rand() function to produce an array of arbitrary values sampled from a uniform distribution over the range, [0,1).
We will then go about scaling these values to alter the change of their distribution to (lower bound, upper bound), where the bounds are mentioned in the form of a 2D array with every dimension correlating to every input variable:
[Control]
1 2 3 | … # initialise population of candidate solutions randomly within the specified bounds pop = bounds[:, 0] + (rand(pop_size, len(bounds)) * (bounds[:, 1] – bounds[:, 0])) |
It is within these identical bounds that the objective function will also be assessed. An objective function of choosing and the bounds on every input variable, may, thus be defined as follows:
1 2 3 4 5 6 | # define objective function def obj(x): return 0
# define lower and upper bounds bounds = asarray([-5.0, 5.0]) |
We can assess our initial population of candidate solutions by passing it to the objective function as input argument.
1 2 3 | … # evaluate initial population of candidate solutions obj_all = [obj(ind) for ind in pop] |
We shall be substituting the values in obj_all with better ones as the population evolves and converges towards an optimum solution.
We can then loop over a preset number of iterations of the algorithm, like 100 or 1000, as mentioned by parameter, iter, and also as over all candidate solutions.
1 2 3 4 5 6 | … # run iterations of the algorithm for i in range(iter): # iterate over all candidate solutions for j in range(pop_size): … |
The first step of the algorithm iteration carries out a mutation process. For this reason, three random candidates, a, b, and c, that are not the present one, are arbitrarily chosen from the population and a mutated vector is produced by computing: a + F * (b-c). Remember that F ∈ [0, 2] and signifies the mutation scale factor.
[Control]
1 2 3 4 | … # choose three candidates, a, b and c, that are not the current one candidates = [candidate for candidate in range(pop_size) if candidate != j] a, b, c = pop[choice(candidates, 3, replace=False)] |
The mutation procedure is carried out by the function, mutation, to which we pass a, b, c, and F as input arguments.
1 2 3 4 5 6 7 | # define mutation operation def mutation(x, F): return x[0] + F * (x[1] – x[2]) … # perform mutation mutated = mutation([a, b, c], F) … |
As we are operating within a bounded range of values, we are required to check if the newly mutated vector also lies within the mentioned bounds, and if not, clip its values to the upper or lower limits as required. This check is performed by the function, check_bounds
2 3 4 | # define boundary check operation def check_bounds(mutated, bounds): mutated_bound = [clip(mutated[i], bounds[i, 0], bounds[i, 1]) for i in range(len(bounds))] return mutated_bound |
The next step carries out crossover, where particular values of the current, target, vector are substituted by the corresponding values in the mutated vector, to develop a trial vector. The decision of which values to substitute is on the basis of if a uniform random value produced for every input variable falls under a crossover rate. If it does so, then the correlating values from the mutated vector are copied to the target vector.
The crossover process is implemented by the crossover() function, which considers the mutated and target vectors as input, in addition to the crossover rate cr ∈ [0, 1], and the number of input variables.
[Control]
1 2 3 4 5 6 7 8 9 10 11 | # define crossover operation def crossover(mutated, target, dims, cr): # generate a uniform random value for every dimension p = rand(dims) # generate trial vector by binomial crossover trial = [mutated[i] if p[i] < cr else target[i] for i in range(dims)] return trial … # perform crossover trial = crossover(mutated, pop[j], len(bounds), cr) … |
A last selection step substitutes the target vector by the trial vector if the latter yields a lower objective function value. For this reason, we assess both vectors on the objective function and subsequently perform selection, recording the new objective function value in obj_all if the trial vector is discovered to be the fittest of the two.
[Control]
1 2 3 4 5 6 7 8 9 10 11 | … # compute objective function value for target vector obj_target = obj(pop[j]) # compute objective function value for trial vector obj_trial = obj(trial) # perform selection if obj_trial < obj_target: # replace the target vector with the trial vector pop[j] = trial # store the new objective function value obj_all[j] = obj_trial |
We can connect all steps together into a differential_evolution() function that takes in as input arguments the population size, the bounds of every input variable, the cumulative number of iterations, the mutation scale factor and the crossover rate, and gives back the best solution discovered and its assessment.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | def differential_evolution(pop_size, bounds, iter, F, cr): # initialise population of candidate solutions randomly within the specified bounds pop = bounds[:, 0] + (rand(pop_size, len(bounds)) * (bounds[:, 1] – bounds[:, 0])) # evaluate initial population of candidate solutions obj_all = [obj(ind) for ind in pop] # find the best performing vector of initial population best_vector = pop[argmin(obj_all)] best_obj = min(obj_all) prev_obj = best_obj # run iterations of the algorithm for i in range(iter): # iterate over all candidate solutions for j in range(pop_size): # choose three candidates, a, b and c, that are not the current one candidates = [candidate for candidate in range(pop_size) if candidate != j] a, b, c = pop[choice(candidates, 3, replace=False)] # perform mutation mutated = mutation([a, b, c], F) # check that lower and upper bounds are retained after mutation mutated = check_bounds(mutated, bounds) # perform crossover trial = crossover(mutated, pop[j], len(bounds), cr) # compute objective function value for target vector obj_target = obj(pop[j]) # compute objective function value for trial vector obj_trial = obj(trial) # perform selection if obj_trial < obj_target: # replace the target vector with the trial vector pop[j] = trial # store the new objective function value obj_all[j] = obj_trial # find the best performing vector at each iteration best_obj = min(obj_all) # store the lowest objective function value if best_obj < prev_obj: best_vector = pop[argmin(obj_all)] prev_obj = best_obj # report progress at each iteration print(‘Iteration: %d f([%s]) = %.5f’ % (i, around(best_vector, decimals=5), best_obj)) return [best_vector, best_obj] |
Now that we have gone about implementing the differential evolution algorithm, let’s look into how to leverage it to optimize an objective function.
Differential evolution algorithm on the Sphere Function
In this section, we will go about applying the differential evolution algorithm to an objective function. We will leverage a simple 2D sphere objective function specified within the bounds, [-5,5] The sphere function is continuous, convex, and unimodal, and is signified by a single global minimum at f(0,0) = 0.0
1 2 3 | # define objective function def obj(x): return x[0]**2.0 + x[1]**2.0 |
We will reduce this objective function leveraging the differential evolution algorithm, on the basis of the strategy DE/rand/1/bin
In order to do this, we must go about defining values for the algorithm parameters, particularly for the population size, the number of iterations, the mutation scale factor and the crossover rate. We set these values empirically to 10, 100, 0.5, and 0.7 respectively.
1 2 3 4 5 6 7 8 9 | … # define population size pop_size = 10 # define number of iterations iter = 100 # define scale factor for mutation F = 0.5 # define crossover rate for recombination cr = 0.7 |
We also define the bounds of every input variable
1 2 3 | … # define lower and upper bounds for every dimension bounds = asarray([(-5.0, 5.0), (-5.0, 5.0)]) |
Next, we carry out the search and report the outcomes.
[Control]
1 2 3 | … # perform differential evolution solution = differential_evolution(pop_size, bounds, iter, F, cr) |
Connecting this all together, the complete instance is listed below:
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | # differential evolution search of the two-dimensional sphere objective function from numpy.random import rand from numpy.random import choice from numpy import asarray from numpy import clip from numpy import argmin from numpy import min from numpy import around
# define objective function def obj(x): return x[0]**2.0 + x[1]**2.0
# define mutation operation def mutation(x, F): return x[0] + F * (x[1] – x[2])
# define boundary check operation def check_bounds(mutated, bounds): mutated_bound = [clip(mutated[i], bounds[i, 0], bounds[i, 1]) for i in range(len(bounds))] return mutated_bound
# define crossover operation def crossover(mutated, target, dims, cr): # generate a uniform random value for every dimension p = rand(dims) # generate trial vector by binomial crossover trial = [mutated[i] if p[i] < cr else target[i] for i in range(dims)] return trial
def differential_evolution(pop_size, bounds, iter, F, cr): # initialise population of candidate solutions randomly within the specified bounds pop = bounds[:, 0] + (rand(pop_size, len(bounds)) * (bounds[:, 1] – bounds[:, 0])) # evaluate initial population of candidate solutions obj_all = [obj(ind) for ind in pop] # find the best performing vector of initial population best_vector = pop[argmin(obj_all)] best_obj = min(obj_all) prev_obj = best_obj # run iterations of the algorithm for i in range(iter): # iterate over all candidate solutions for j in range(pop_size): # choose three candidates, a, b and c, that are not the current one candidates = [candidate for candidate in range(pop_size) if candidate != j] a, b, c = pop[choice(candidates, 3, replace=False)] # perform mutation mutated = mutation([a, b, c], F) # check that lower and upper bounds are retained after mutation mutated = check_bounds(mutated, bounds) # perform crossover trial = crossover(mutated, pop[j], len(bounds), cr) # compute objective function value for target vector obj_target = obj(pop[j]) # compute objective function value for trial vector obj_trial = obj(trial) # perform selection if obj_trial < obj_target: # replace the target vector with the trial vector pop[j] = trial # store the new objective function value obj_all[j] = obj_trial # find the best performing vector at each iteration best_obj = min(obj_all) # store the lowest objective function value if best_obj < prev_obj: best_vector = pop[argmin(obj_all)] prev_obj = best_obj # report progress at each iteration print(‘Iteration: %d f([%s]) = %.5f’ % (i, around(best_vector, decimals=5), best_obj)) return [best_vector, best_obj]
# define population size pop_size = 10 # define lower and upper bounds for every dimension bounds = asarray([(-5.0, 5.0), (-5.0, 5.0)]) # define number of iterations iter = 100 # define scale factor for mutation F = 0.5 # define crossover rate for recombination cr = 0.7
# perform differential evolution solution = differential_evolution(pop_size, bounds, iter, F, cr) print(‘\nSolution: f([%s]) = %.5f’ % (around(solution[0], decimals=5), solution[1])) |
Executing the instance reports the progress of the search which includes the iteration number, and the response from the objective function every time an improvement is identified.
At the conclusion of the search, the ideal solution is discovered and its evaluation is reported.
Note: Your outcomes may wary provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical precision. Think about running the instance a few times and contrast the average result.
In this scenario, we can observe that the algorithm converges very near to f(0.0,0.0) = 0.0 in approximately 33 improvements out of 100 iterations
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | Iteration: 1 f([[ 0.89709 -0.45082]]) = 1.00800 Iteration: 2 f([[-0.5382 0.29676]]) = 0.37773 Iteration: 3 f([[ 0.41884 -0.21613]]) = 0.22214 Iteration: 4 f([[0.34737 0.29676]]) = 0.20873 Iteration: 5 f([[ 0.20692 -0.1747 ]]) = 0.07334 Iteration: 7 f([[-0.23154 -0.00557]]) = 0.05364 Iteration: 8 f([[ 0.11956 -0.02632]]) = 0.01499 Iteration: 11 f([[ 0.01535 -0.02632]]) = 0.00093 Iteration: 15 f([[0.01918 0.01603]]) = 0.00062 Iteration: 18 f([[0.01706 0.00775]]) = 0.00035 Iteration: 20 f([[0.00467 0.01275]]) = 0.00018 Iteration: 21 f([[ 0.00288 -0.00175]]) = 0.00001 Iteration: 27 f([[ 0.00286 -0.00175]]) = 0.00001 Iteration: 30 f([[-0.00059 0.00044]]) = 0.00000 Iteration: 37 f([[-1.5e-04 8.0e-05]]) = 0.00000 Iteration: 41 f([[-1.e-04 -8.e-05]]) = 0.00000 Iteration: 43 f([[-4.e-05 6.e-05]]) = 0.00000 Iteration: 48 f([[-2.e-05 6.e-05]]) = 0.00000 Iteration: 49 f([[-6.e-05 0.e+00]]) = 0.00000 Iteration: 50 f([[-4.e-05 1.e-05]]) = 0.00000 Iteration: 51 f([[1.e-05 1.e-05]]) = 0.00000 Iteration: 55 f([[1.e-05 0.e+00]]) = 0.00000 Iteration: 64 f([[-0. -0.]]) = 0.00000 Iteration: 68 f([[ 0. -0.]]) = 0.00000 Iteration: 72 f([[-0. 0.]]) = 0.00000 Iteration: 77 f([[-0. 0.]]) = 0.00000 Iteration: 79 f([[0. 0.]]) = 0.00000 Iteration: 84 f([[ 0. -0.]]) = 0.00000 Iteration: 86 f([[-0. -0.]]) = 0.00000 Iteration: 87 f([[-0. -0.]]) = 0.00000 Iteration: 95 f([[-0. 0.]]) = 0.00000 Iteration: 98 f([[-0. 0.]]) = 0.00000 Solution: f([[-0. 0.]]) = 0.00000 |
We can go about plotting the objective function values returned at each improvement through altering the differential_evolution() function a bit to maintain track of the objective function values and return this is in the list obj_iter
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | def differential_evolution(pop_size, bounds, iter, F, cr): # initialise population of candidate solutions randomly within the specified bounds pop = bounds[:, 0] + (rand(pop_size, len(bounds)) * (bounds[:, 1] – bounds[:, 0])) # evaluate initial population of candidate solutions obj_all = [obj(ind) for ind in pop] # find the best performing vector of initial population best_vector = pop[argmin(obj_all)] best_obj = min(obj_all) prev_obj = best_obj # initialise list to store the objective function value at each iteration obj_iter = list() # run iterations of the algorithm for i in range(iter): # iterate over all candidate solutions for j in range(pop_size): # choose three candidates, a, b and c, that are not the current one candidates = [candidate for candidate in range(pop_size) if candidate != j] a, b, c = pop[choice(candidates, 3, replace=False)] # perform mutation mutated = mutation([a, b, c], F) # check that lower and upper bounds are retained after mutation mutated = check_bounds(mutated, bounds) # perform crossover trial = crossover(mutated, pop[j], len(bounds), cr) # compute objective function value for target vector obj_target = obj(pop[j]) # compute objective function value for trial vector obj_trial = obj(trial) # perform selection if obj_trial < obj_target: # replace the target vector with the trial vector pop[j] = trial # store the new objective function value obj_all[j] = obj_trial # find the best performing vector at each iteration best_obj = min(obj_all) # store the lowest objective function value if best_obj < prev_obj: best_vector = pop[argmin(obj_all)] prev_obj = best_obj obj_iter.append(best_obj) # report progress at each iteration print(‘Iteration: %d f([%s]) = %.5f’ % (i, around(best_vector, decimals=5), best_obj)) return [best_vector, best_obj, obj_iter] |
We can then develop a line plot of these objective function values to observe the relative changes at every improvement during the search.
[Control]
1 2 3 4 5 6 7 8 9 10 | from matplotlib import pyplot … # perform differential evolution solution = differential_evolution(pop_size, bounds, iter, F, cr) … # line plot of best objective function values pyplot.plot(solution[2], ‘.-‘) pyplot.xlabel(‘Improvement Number’) pyplot.ylabel(‘Evaluation f(x)’) pyplot.show() |
Attempting this together, the complete instance is detailed below:
# differential evolution search of the two-dimensional sphere objective function
from numpy.random import rand
from numpy.random import choice
from numpy import asarray
from numpy import clip
from numpy import argmin
from numpy import min
from numpy import around
from matplotlib import pyplot
# define objective function
def obj(x):
return x[0]**2.0 + x[1]**2.0
# define mutation operation
def mutation(x, F):
return x[0] + F * (x[1] – x[2])
# define boundary check operation
def check_bounds(mutated, bounds):
mutated_bound = [clip(mutated[i], bounds[i, 0], bounds[i, 1]) for i in range(len(bounds))]
return mutated_bound
# define crossover operation
def crossover(mutated, target, dims, cr):
# generate a uniform random value for every dimension
p = rand(dims)
# generate trial vector by binomial crossover
trial = [mutated[i] if p[i] < cr else target[i] for i in range(dims)]
return trial
def differential_evolution(pop_size, bounds, iter, F, cr):
# initialise population of candidate solutions randomly within the specified bounds
pop = bounds[:, 0] + (rand(pop_size, len(bounds)) * (bounds[:, 1] – bounds[:, 0]))
# evaluate initial population of candidate solutions
obj_all = [obj(ind) for ind in pop]
# find the best performing vector of initial population
best_vector = pop[argmin(obj_all)]
best_obj = min(obj_all)
prev_obj = best_obj
# initialise list to store the objective function value at each iteration
obj_iter = list()
# run iterations of the algorithm
for i in range(iter):
# iterate over all candidate solutions
for j in range(pop_size):
# choose three candidates, a, b and c, that are not the current one
candidates = [candidate for candidate in range(pop_size) if candidate != j]
a, b, c = pop[choice(candidates, 3, replace=False)]
# perform mutation
mutated = mutation([a, b, c], F)
# check that lower and upper bounds are retained after mutation
mutated = check_bounds(mutated, bounds)
# perform crossover
trial = crossover(mutated, pop[j], len(bounds), cr)
# compute objective function value for target vector
obj_target = obj(pop[j])
# compute objective function value for trial vector
obj_trial = obj(trial)
# perform selection
if obj_trial < obj_target:
# replace the target vector with the trial vector
pop[j] = trial
# store the new objective function value
obj_all[j] = obj_trial
# find the best performing vector at each iteration
best_obj = min(obj_all)
# store the lowest objective function value
if best_obj < prev_obj:
best_vector = pop[argmin(obj_all)]
prev_obj = best_obj
obj_iter.append(best_obj)
# report progress at each iteration
print(‘Iteration: %d f([%s]) = %.5f’ % (i, around(best_vector, decimals=5), best_obj))
return [best_vector, best_obj, obj_iter]
# define population size
pop_size = 10
# define lower and upper bounds for every dimension
bounds = asarray([(-5.0, 5.0), (-5.0, 5.0)])
# define number of iterations
iter = 100
# define scale factor for mutation
F = 0.5
# define crossover rate for recombination
cr = 0.7
# perform differential evolution
solution = differential_evolution(pop_size, bounds, iter, F, cr)
print(‘\nSolution: f([%s]) = %.5f’ % (around(solution[0], decimals=5), solution[1]))
# line plot of best objective function values
pyplot.plot(solution[2], ‘.-‘)
pyplot.xlabel(‘Improvement Number’)
pyplot.ylabel(‘Evaluation f(x)’)
pyplot.show()
Running the instance develops a line plot.
The line plot displays the objective function evaluation for every improvement, with big changes to start with and very minimal changes towards the end of the search as the algorithm converged on the optima.
Further Reading
The section furnishes additional resources on the topic if you are seeking to delve deeper.
Papers
A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces, 1997
Recent advances in differential evolution: An updated survey, 2016
Books
Algorithms for Optimization, 2019
Articles
Differential evolution, Wikipedia
Summary
In this guide, you found out about the differential evolution algorithm.
Particularly, you learned:
- Differential evolution being a heuristic approach to the global optimization of nonlinear and non-differentiable continuous space functions
- How to go about implementing the differential evolution algorithm from the ground up in Python
- How to apply the differential evolution algorithm to a real-valued 2D objective function.