If a job is submitted to a slave it will contain the two starting nodes of the trees. Before starting the collision detection, the slave will have to locate these nodes in its memory space. One possibility is to specify the nodes by the path to follow starting at the root of the tree. This would require specifying two values: a bit string containing the left/right ``instructions'' for the path and an integer containing the number of bits that are valid in the bit string so you don't descend too deep in the tree. A faster and maybe a little bit simpler way is to index the trees as in Figure 5. Next to the actual tree that approximates the object, we also construct an index. This index contains two arrays: The first array contains pointers to the actual nodes, the second array contains offsets, indicating at which offset from the current position the right child of the node corresponding to the current position in the array can be found.
Then, of course, there is also the part that does the actual collision detection. Only the procedure for the tree traversal is different from the sequential version, the rest is identical. The tree traversal has a separate version for parallel execution since functionality has to be added to maintain information about the index. Also, the current search depth has to be checked against the maximum and a ``new job'' command issued to the master if necessary.
This pretty much covers the implementation if we don't want to go into too much detail. In the next section, we will discuss the results obtained by the parallelization of the hierarchical collision detection.