Node | Node | |
Data | ||
Data | ||
computation | ||
Data | ||
Data | ||
The main concerns when parallelizing with MPI or PVM, is where to locate the data. The code is ubiquitous, so the only two things we care about is that all processes have the data they need when they need it, and that no dependencies between data are broken.
The latter concern is probably what leads to most bugs (at least in my personal experience) when programming with MPI. It is easy to distribute all of the needed data among all processes, and it is easy to redistribute the data once it has changed. But making sure that no process actually starts using any of the older data (which is still present on that particular machine) can be tricky, especially when one tries to optimize the communication for better performance later on.