Whenever a task is distributed, the code that makes up the task and the data the code depends upon is sent over the network to the destination node.
When the node completes its computations, the resulting data is sent back over the network.
Obviously this leaves room for optimization. It would be a good idea to let the scheduler be aware of the current localization of data, so that new tasks could be distributed to the nodes that already hold some of the data the new task needs, in order to minimize network traffic.
There has not been put much work into optimization of the network communications. The first concern was to get a system running with a decent virtual machine. Also, the current approach is a fairly clean implementation of a CSP specification of this system. This means, it never required much debugging, it is simple, and it works.
Because the parallelizers seek to group the instructions that depend on each other into tasks, the actual communication requirements are already minimized somewhat.
The network communication layer and the scheduler are two things that we cannot yet begin to optimize. We must have a virtual machine, with appropriate parallelizers, so that we can run larger programs that actually resemble real scientific computing code. Before that, any ``tuning'' of the communication strategies would be purely guesswork.
Besides the tuning at the more ``strategic'' level, there is also a large number of performance optimizations at a lower level that could be implemented. There is no reason why we shouldn't do asynchronous communication (eg. start receiving results from a number of nodes, while we're still waiting for others to complete their computations) for example. Also, the entire protocol encoding and decoding code is currently not very efficient.
The main efforts so far have been put into the virtual machine, and the automatic parallelization strategies in it.