I have made heavy use of the C++ standard template library for implementing the sets, maps, lists of sets, and other fairly complicated but efficient data structures.
There are still optimizations that should be done in the code. There may be cases where we spoil the otherwise fine algorithmic complexity of the parallelizer. The following is a completely non-academic and unproven estimate of the complexity of the sequence parallelizer. It looks as though this estimate is possible to meet, but there may well be cases in the current implementation that blows this.
In a code sequence with instructions, where the instructions have a total of arguments, we should be able to parallelize the entire sequence in time.
is the cost of building up a balanced tree holding arguments (dependencies). We use balanced trees to implement the dependency sets.
The is the cost of looking up dependencies in our mapping between touched variables and the touchers, for instructions. Again, the mapping is implemented using a balanced tree, and it is kept up to date as we move instructions between the various sequences.
The number of dependencies is likely to be a small number compared to the number of instructions , so effectively we may see run-time that looks more like .