Distributed memory (MPI/PVM)

Next: Shared memory (threads) Up: Parallelization paradigms Previous: Parallelization paradigms Contents

Distributed memory (MPI/PVM)

A typical MPI or PVM program will typically run in cycles that resemble the following:

Node		Node
Data	$\to$
	$\gets$	Data
computation
Data	$\to$
	$\gets$	Data

The actual code that does the computation is at all times present on all nodes in the cluster. Each node is designed to do a little part of the large computation, and all nodes will sooner or later (often repeatedly during the computation) need to tell the others about their results.

The main concerns when parallelizing with MPI or PVM, is where to locate the data. The code is ubiquitous, so the only two things we care about is that all processes have the data they need when they need it, and that no dependencies between data are broken.

The latter concern is probably what leads to most bugs (at least in my personal experience) when programming with MPI. It is easy to distribute all of the needed data among all processes, and it is easy to redistribute the data once it has changed. But making sure that no process actually starts using any of the older data (which is still present on that particular machine) can be tricky, especially when one tries to optimize the communication for better performance later on.

Next: Shared memory (threads) Up: Parallelization paradigms Previous: Parallelization paradigms Contents

1999-08-09