Parallelisation issues #2

Random number generator

In the sequential version of lil-gp all the subpopulations reside on the same CPU and thus share the same random number generator. Lil-gp provides several ways to initialize the random number generator. Parallel lil-gp uses only one of them in order to ensure correction operations.

Why does the generation of random number become an issue? Well in this case the problem mainly occurs at the moment that the subpopulations are being generated at random. If all the CPU's used in a parallel experiment were to use the same seed this would result in the same individuals being generated on all the CPU's (provided that each CPU processes the same number of subpopulations). This seriously limit the performance of the genetic programming loop as well as that of the parallel program since it would compute the same things several times in parallel.

To circumvent this problem each CPU is provided a different seed at the start of the experiment. This is done in very simple way at the moment by incrementing the seed each time it has been send. People who suspect this as the cause of bad results should have a look at the send_parameter function. Please let me know if any scientifically justified methods exist for that matter I will be happy to integrate it into parallel lil-gp.
 

Exchanges

Previously the distribution of subpopulations on CPU's and the need for exchanges between different CPU's has been introduced. One can identify 4 kinds of exchanges depending on the source and destination subpopulations only. There are 2 additional variations when if one considers the element being exchanged, an entire individual or a tree.
 

 Destination: local  Destination: not local
 Origin: local
 pure local
 export
 Origin: not local
 import
 nothing

These four kind of exchanges will be described very briefly.

  1. Pure local:  this kind of exchange is exactly what happened in the sequential version of lil-gp. This can be done very quickly since everything is happing in memory (so to say)
  2. Export:  the individual (or tree) is send to the destination population. In this case the tree or individal has to be send to another node. This require only a send operation locally, no furhter administration is required.
  3. Import: an individual (or tree) is received from a source population. One or more trees/individuals are coming from another computing node. They thus need to be imported prior to doing the actual exchange.
  4. Nothing: both soucre and destination do not reside on the given node. In the situation nnothing has to be done for this exchange and the kernel will move to the next exchange or resuming with the genetic operation if no further exchanges are specified.

Synchronous vs asynchronous

Above the four kinds of exchanges have been described shortly. What was not mentioned there is that one can adopt 2 different strategies when it comes to importing trees or individuals.

Indeed, no one can predict the evolution of the individuals of a population. In some experments the individuals tend to become bigger hence requiring more computation time when being evaluated. Many other reasons that can lead to an unequal balance of the computational work load exist (not necessarily GP related ones). A direct result of this is that there is no guarantee that the different nodes in de PVM are processing generations at the same pace (give identical hardware that is).

Since the different subpopulations can evolve at different speed it is possible that one node has to wait for a tree/individual that should be send by another node. Or on the other hand on could decide to discard this exchange and move to the next one. This has the advantage of avoiding CPU idle time but makes it impossible to reproduce exactly some runs. The parallel kernel makes it possible to chose between the 2 import strategies: synchronous and asynchronous imports.

In order to make it possible to offer both operation modes the parallel kernel handles exchanges differently than the original kernel. In synchronous mode it is necessary for the kernel to execute the exchange exports first. If this is not done deadlocks are likely to occur if one wants any exchange topology to be possible. In a second step the kernel will receive any trees and individuals that might have been send to it. Only after these two steps the exchanges will be done in a way similar to the one used by the original kernel.

The differences between synchronous and asynchronous mode are represented in the table underneath.

Synchronous Asynchronous
Export Examine the exchanges and send to other populations Idem
Import Wait until all the trees/individuals specified in the exchanges have been received. Also check the generation of elements that are being received. Look if something arrived, if so we import it otherwise we end the import step. No generation check is done. Elements do not need to be from any precise generation.
Local and imports(bis) Execute the local exchanges or insert the received individuals/trees in the local populations. If individuals/trees were not received we skip this exchange.

Asynchronous operations should be faster but I honestly did not notice are real difference yet. The price to pay is the your experiments are depending on factors you can not control (load of the network and cpu load). Synchronous operations the other hand can be reproduced but are generally slower (maybe not from the genetic point of view of course). But you now can repeat your experminents.

The information provided on this page is copyrighted © by the Free University of Brussels and provided as is.
From more information contact Johan Parent