Genetic algorithms

What is the problem?

Suppose you have a system which you want to control. The reason for this could be to achieve better performance for instance. In the situation depicted here one may want to adjust the parameters of the system in order to produces more little man . Of course for real life applications a lot of factors have to be taken into account. If you increase the speed of the conveyer belt for instance you will be required to make some change to other parameters as well. And maybe this will influence some part of the production process in ways that are difficult to predict. And so you may end up having little men with different shapes because of the interactions that specific to that very machine. This can be summarized by saying that there are some non-linearities in the system which are very difficult to model and even more to compute. So why try to model what the machine will do? You can do it for small systems but what about real big ones? (Note that the system can be a game, computer program or a machine.)
A system depending on a set of parameters for its performance/quality of operation.

A system depending on a set of parameters for its performance/quality of operation.

The idea: evolution

Instead of modelling the system why not try different solutions on it? This of course implies that it is indeed save to submit the system to a series of experiments. Just try different set of parameters and see how well they perform. This allows you to determine the fitness of a solution. Based on the fitness you can retain some of the best solutions and throw away the bad ones. In this way you introduce some selection which will help you improve you solutions.

But before going any further it is important to introduce the term individual. Until now this has been call a solution but individual is term used in the literature. So, one such individual can be viewed as consisting of 2 elements: its GENE and its fitness. The gene being in the example given above the different parameters of the system. A collection of individuals is generally called a population. A population will change over time (see later) and will consist of different generations of individuals.

A population consist of several individuals (possibly thousands).

Let us resume at this point using the brief terminology introduced above. You take a population (a collection of individuals) and determine the fitness of each individual. In a next step you use a policy to retain some of the solution (usually the best ones). Since some individuals where discarded you need to introduce new ones in the population. This is done by using genetic operators. These are applied on the remaining individuals in the hope the improve the population (that how we improve). We have a new population (or partly at least) we can restart the cycle.

By using both a selection process and genetic operators bad individuals are removed and new ones are generated with possibly greater potential.

Does it always work?

Hmm ... not with me :-{ Although genetic algorithms can indeed find good solution and even surprising ones it is still no magic.

Important factors are the evaluation for example. If the function is to strict for instance the likely hood of success it remote. If your evaluation function (to look at it from the programming side) computes finesses in a rather bimodal way (that is to say: or you are good or you are real bad) then the fitness would give a bad indication about the quality of an individual. This makes it very hard to compare good ones with each other for instance.

Another factor is the size of the population. Of course the workload is affected by this choice but the quality and the speed of the genetic algorithm as well. To put it simple you can give more space for a good solution to appear early if you use a big population. But then again there is no guarantee about that either.

Yet other important parameters are related to the choice (probabilities) of the genetic operators and the selection criterion used.

Phases

The number of step of genetic algorithm in its simplest form are fairly small:

Generate a random population to start with (the first generation)
Evaluate the entire population
Apply the selection criterion
Use evolution to create new individuals (next generation)
Restart at step 2

Initial random population

This is quite obvious I guess fill your generation with randomly generated individuals.

Note: the fact that this is random is important as the genetic diversity will allow new solution to emerge.

Evaluation

This step is depends on the problem at hand. As the individual is of course evaluate within the context of the problem we try to solve/optimize. The fitness of each individual is determined in this step.

Selection

At this point it is possible to choose those individuals who are performing well for what we want to achieve. All the other ones are not suitable and will be discarded here. As a result we end up with a smaller population. Different criterion can be used. It is of course a choice the as to be made by the programmer.

By far the easiest approach is to used a TOP N policy. At each selection step only the N best individuals will survive.
Another approach is to set a threshold for the fitness. Every individual with a fitness higher than the threshold will remain.
Another approach is the used of tournaments. Two individual are chosen at random and compete again each other. The one with the highest fitness wins for example.
Yet another approach is roulette selection. Many variations can be proposed.

Genetic operators

In this step we use the better individuals to repopulate our genetic pool. We can do this using genetic operators which work at the gene level. The operation performed are combination, modification and duplication. The selection of the different genetic operations usually happens at random. But in order to steer the genetic algorithm probabilities are introduced in the process. (Like 85% probability for one operator, 10% for another etc..)

Crossover: combines the gene of two individuals by splitting them at a randomly chosen point and interchanging the segments. Since this process creates 2 new genes one usually chooses one of them at random.
Mutation: here a gene of individual is simply duplicated but an error is introduced on purpose in the process. Mutation are the major source of surprising results. A mutant can indeed turn out to perform quite well compared to others. And if not it will be illuminated quite naturally at the selection step.
Copy: this pure duplication, hereby the influence of a solution in the population increases. This fosters the chances of survival of the individual's gene.

Genetic operation on the genes: crossover and mutation

The new individuals are thus combinations or mutations of the individuals that were retained by the selection process. An attempt has been done to illustrate this. One can see that in the new population a centaur appeared (combination of horse and man). Some of the new individuals will indeed turn out to be better some not but this is of no concern at this point.
After applying the genetic operators we get new individuals which combine the properties of the best ones.

When to stop?

How many times do we have to cycle trough the entire process? Well there are 2 situations one can distinguish:

The problem we try to solve has an optimal solution and we found it. In that case it is quite obvious that there is not reason to continue. One can of course define optimal in different ways. (For example when the error is small the epsilon)
In the other case we will have to settle for a good solution, that is a solution that can be considered not satisfactory for what we want to achieve. Usually this means we limit the number of generations. The genetic algorithm will be stop after a certain number of cycles and the best individual will be used.

Document was written by Parent Johan (January 2000)