Although it may sound like CSP is only useful on distributed memory architectures, that is not the case. A program designed as a set of communicating sequential processes can easily be implemented on a shared memory system. The only difference being, that the message passing between the processes can now be implemented more efficiently, now that processes (or ``threads'') can actually share some memory.
The nodes that make up our distributed virtual machine can be thought of as sequential processes that communicate either when they receive a new job (a task description containing code and data), or when they hand back a result.
The parallelization engine relies heavily on this property. Each single opcode is thought of as a message (the opcode and the data it uses), that can be sent to some other sequential process (a node in the cluster), to produce a reply message (the data the opcode touches, our result).
What the parallelization engine actually does is, to group the opcodes so that dependencies between opcodes are respected, but in a way so that the number of opcode ``groups'' matches the number of spare nodes in the cluster. This way we can see each opcode group (what we have called a ``task'' elsewhere) as a single message containing the list of opcodes and the data they depend upon. These messages are sent to the nodes, and the sender will then expect a reply message holding the result data.