next up previous contents
Next: Loop parallelization Up: Sequence parallelization Previous: When we do spawn   Contents

Overview

The code involved directly with the sequence parallelization is roughly 1300 lines of C++ code. Half of it is either generic code used elsewhere as well, or comments. The core of the sequence parallelizer is more like 600 lines of code, including comments. Most of this code is book-keeping code for handling the instruction sequences, the dependency sets etc. etc.

I have made heavy use of the C++ standard template library for implementing the sets, maps, lists of sets, and other fairly complicated but efficient data structures.

There are still optimizations that should be done in the code. There may be cases where we spoil the otherwise fine algorithmic complexity of the parallelizer. The following is a completely non-academic and unproven estimate of the complexity of the sequence parallelizer. It looks as though this estimate is possible to meet, but there may well be cases in the current implementation that blows this.

In a code sequence with $p$ instructions, where the instructions have a total of $q$ arguments, we should be able to parallelize the entire sequence in $\mathcal{O}(pq \log q + q \log q)$ time.

$\mathcal{O}(q \log q)$ is the cost of building up a balanced tree holding $q$ arguments (dependencies). We use balanced trees to implement the dependency sets.

The $\mathcal{O}(pq\log q)$ is the cost of looking up $q$ dependencies in our mapping between touched variables and the touchers, for $p$ instructions. Again, the mapping is implemented using a balanced tree, and it is kept up to date as we move instructions between the various sequences.

The number of dependencies $q$ is likely to be a small number compared to the number of instructions $p$, so effectively we may see run-time that looks more like $\mathcal{O}(pq\log q)$.


next up previous contents
Next: Loop parallelization Up: Sequence parallelization Previous: When we do spawn   Contents

1999-08-09