Results

Next: Conclusion Up: Parallel Collision Detection Previous: Slave

Results

Now, what kind of results can we expect from the parallel version of the collision detection. In this section I will present some of the measurements performed with our implementation. I will only discuss one collision detection mode, namely AABB Trees without Sweep and Prune, simply because this mode performed best. The rest of the results are given in the end, without comment. You will see that these results are comparable to the ones presented below, albeit less explicit.

The experiment we performed with the parallel algorithms was very simple. We have a scene with 20 spheres of each about 500 faces. The spheres were put in the scene at random and assigned a random translation vector and a random rotation angle for the x,y and z axes. Then a number of steps (10) was performed keeping the translation and rotation parameters constant throughout the animation. If a sphere left the boundaries of the scene, it was reset to a new random position, translation and rotation. For each step, collision detection was performed using one of the implemented techniques.

Now, to be able to compare results we executed this experiment for the different setups. We varied two parameters, namely the number of processors working concurrently and the granularity of the problem. Increasing the number of processes causes the number of slaves to be increased equally and so increasing the parallelism. In general we expect a speedup in the solution by adding a processor. Varying granularity is done by tuning the maximal search depth in the slaves. We can control the granularity by specifying a percentage. If a granularity is specified, the maximal depth of the trees is calculated. The maximal search depth will now be set to the percentage of the maximal tree depth dictated by the granularity parameter. It is important to remark that if we want to make sure that no extra jobs are created, the percentage specifier should be larger than 200 %. This is because if we have two identical objects in the scene of maximal tree depth, the collision detection may have to descend a number of times equal to twice the maximal depth. Now, the first chart shows the total time needed by the program to execute the entire experiment. We varied the number of processors from 1-11, with 10 being the actual number of processors in the Virtual Machine. The granularity was varied from 250 % to 120 %. At this percentage we had to stop the experiments, since the execution time became prohibitively large. We can see the result in Figure 6.

**Figure 6:** AABB total time Needed
$\begin{figure} \begin{center} \includegraphics [width=8cm] {aabb_tot.eps} \end{center}\end{figure}$

The first thing that stands out in the graph is the sharp increase in execution time between a granularity of 140 % and 160 %. This is probably due to the specific structure of the tree representing the sphere, causing most searches to be over at a depth of 160% and unfinished at 140 %. Such an effect is of course magnified since we use a number of replicas of the same objects. The second trend that becomes noticeable in the high granularities is that adding more processors in the beginning induces a speedup as expected, but in the end, adding more processors slows down the execution. As will become apparent from other charts, this is due to the overhead in communication. In the next charts, shown in Figures 7 and 8, we look at the speedup gained by parallel execution. As the legend shows, the different line plots represent different execution granularities. The first graph demonstrates the overhead of the parallel solution compared to the sequential solution, resulting from setup of the slave and the interprocess communication, since the speedup start out below the sequential solution (speedup 1.0). The difference between consecutive points shows the speedup gained by adding another processor. Depending on the granularity this overhead is made up by the speed gain due to the added processors. We can see that for the highest granularities the speedup is never larger than 1.0. In these cases, the communication is a huge bottleneck.

The second chart shows the speedup compared to the parallel solution executed on one processor, using the lowest granularity. The difference between consecutive points shows the speedup gained by adding another processor. As in the previous chart, the high granularities never attain a significant speedup. The highest speedup reached is about 5. From both charts, we can see that as we keep adding processors the performance reaches a maximum and starts to degrade if we add more.

**Figure 7:** AABB speedup compared to sequential solution
$\begin{figure} \begin{center} \includegraphics [width=8cm] {aabb_sus.eps} \end{center}\end{figure}$

**Figure 8:** AABB speedup compared to parallel single proc. solution
$\begin{figure} \begin{center} \includegraphics [width=8cm] {aabb_sup.eps} \end{center}\end{figure}$

The last graph (Figure 9 I deduced from the data obtained in these experiments shows the sharp increase in communication as granularity is increased. This graph was obtained by taking together the communication and work time of all the experiments and computing their ratio for the different granularities, so this is an average result over the different numbers of processors in the experiment.

**Figure 9:** Ratio of communication and work time as granularity increases
$\begin{figure} \begin{center} \includegraphics [width=8cm] {cwr.eps} \end{center}\end{figure}$

Next: Conclusion Up: Parallel Collision Detection Previous: Slave

Marc Ramaekers
5/17/1999