The experiment we performed with the parallel algorithms was very simple. We have a scene with 20 spheres of each about 500 faces. The spheres were put in the scene at random and assigned a random translation vector and a random rotation angle for the x,y and z axes. Then a number of steps (10) was performed keeping the translation and rotation parameters constant throughout the animation. If a sphere left the boundaries of the scene, it was reset to a new random position, translation and rotation. For each step, collision detection was performed using one of the implemented techniques.
Now, to be able to compare results we executed this experiment for the different setups. We varied two parameters, namely the number of processors working concurrently and the granularity of the problem. Increasing the number of processes causes the number of slaves to be increased equally and so increasing the parallelism. In general we expect a speedup in the solution by adding a processor. Varying granularity is done by tuning the maximal search depth in the slaves. We can control the granularity by specifying a percentage. If a granularity is specified, the maximal depth of the trees is calculated. The maximal search depth will now be set to the percentage of the maximal tree depth dictated by the granularity parameter. It is important to remark that if we want to make sure that no extra jobs are created, the percentage specifier should be larger than 200 %. This is because if we have two identical objects in the scene of maximal tree depth, the collision detection may have to descend a number of times equal to twice the maximal depth. Now, the first chart shows the total time needed by the program to execute the entire experiment. We varied the number of processors from 1-11, with 10 being the actual number of processors in the Virtual Machine. The granularity was varied from 250 % to 120 %. At this percentage we had to stop the experiments, since the execution time became prohibitively large. We can see the result in Figure 6.
The first thing that stands out in the graph is the sharp increase in execution time between a granularity of 140 % and 160 %. This is probably due to the specific structure of the tree representing the sphere, causing most searches to be over at a depth of 160% and unfinished at 140 %. Such an effect is of course magnified since we use a number of replicas of the same objects. The second trend that becomes noticeable in the high granularities is that adding more processors in the beginning induces a speedup as expected, but in the end, adding more processors slows down the execution. As will become apparent from other charts, this is due to the overhead in communication. In the next charts, shown in Figures 7 and 8, we look at the speedup gained by parallel execution. As the legend shows, the different line plots represent different execution granularities. The first graph demonstrates the overhead of the parallel solution compared to the sequential solution, resulting from setup of the slave and the interprocess communication, since the speedup start out below the sequential solution (speedup 1.0). The difference between consecutive points shows the speedup gained by adding another processor. Depending on the granularity this overhead is made up by the speed gain due to the added processors. We can see that for the highest granularities the speedup is never larger than 1.0. In these cases, the communication is a huge bottleneck.
The second chart shows the speedup compared to the parallel solution
executed on one processor, using the lowest granularity. The
difference between consecutive points shows the speedup gained by
adding another processor. As in the previous chart, the high
granularities never attain a significant speedup. The highest speedup
reached is about 5. From both charts, we can see that as we keep
adding processors the performance reaches a maximum and starts to
degrade if we add more.
The last graph (Figure 9 I deduced from the data obtained in these experiments shows the sharp increase in communication as granularity is increased. This graph was obtained by taking together the communication and work time of all the experiments and computing their ratio for the different granularities, so this is an average result over the different numbers of processors in the experiment.