When I started on this project, I thought I would spend 30% of the time developing a fast virtual machine, and the remaining 70% on the parallelization. Boy I was wrong.
It turned out, that a virtual machine with a language that lends itself well to parallelism is absolutely vital, if one wishes to do on-the-fly automatic parallelization. The virtual machine was rewritten several times during the early stages of development, simply because it turned out badly with respect to further parallelization of the code it ran.
That being said, I think the current virtual machine has a viable design. It is simple to detect dependencies, and it is not too hard (computationally) to actually generate parallel versions of the functions submitted to the scheduler, using this internal representation of programs.
The current implementation is able to detect simple cases of parallelism. The sequence and the loop parallelizers have been discussed, along with their current short comings. The important thing to remember is, that there should be nothing in the design of the system that makes it very hard to implement those missing features. The language is very predictable with regards to data dependencies, and that is absolutely vital for any further refinement of the parallelizers.
I believe I have come to the conclusion, that automatic parallelization is not just a matter of a single parallelizing translator or execution entity. We will not see good automatic parallelization before we build an automatically parallelizing computing environment, something that includes everything from end-user language to compute-server software.
This project is headed in the direction of an automatically parallelizing distributed computing environment. Such an environment is not built in half a year, at least not by me. But further work on this project, or a similar project, may lead the way to efficient cluster computing in the future.