If the number of tasks the total set of instructions (currently in active tasks) we depend upon, is greater than one, we know that we depend on data that is touched by more than one active task. If this is true, we go serial. That is, we move all tasks to our output buffer, because this is code that generates results we need in order to carry on. The current opcode is moved into the main.
If the number of tasks that hold instructions that touch data we use is one, we place the current opcode in the task that already holds the toucher of the data it uses.
If the toucher of the data the current opcode uses is not in any current task, we simply add the opcode to the task that has the least opcodes.
This leaves room for enhancements. We should move the instruction into the task that either has the lowest computational cost, or in the task that will allow for the best parallelization for any following instructions. I can see how this could be done, but I haven't had the time to implement it.