This is a short description of what issues I consider important to address if this work is continued. It is a description of current ``work in progress'' and work that should be started as soon as possible.
The main issue in getting these suggested extra parallelizations working, seems to be to implement the dependency detection routines in a way so that they handle all cases of indexing that can possibly be handled.
Once we have a complete dependency (uses and touches set generation) routine implementation, the parallelization routines will do a better job without much need for change.
There may be cases that will require a change in the current parallelizers. And there definitely is a need for the ``instruction expansion'' functionality. But I believe that this work can be done in a fairly short amount of time, as the building blocks are falling into place.
As a side note, the loop parallelizer was designed, implemented, and tested in something like four days. I am very optimistic with regards to further expansion on the functionality of these parallelizers.
However, I don't think the parallelizers will need that much more work. The main burden is going to be the completion of the implementation of the dependency detection routines. The main problem here, is to decide whether variables ``intersect''. This is needed and used throughout the parallelizers, and is basically a decision of whether .
<