Fine-Grain MPI (FG-MPI) extends the execution model of
Message Passing Interface (MPI) to allow for interleaved
execution of multiple concurrent MPI processes inside an
OS-process. FG-MPI is integrated into
the MPICH middleware and
has a light-weight design based on coroutines that can scale to
millions of MPI processes on a node or across nodes on a
cluster. FG-MPI provides the ability to take advantage of
finer-grain parallelism available on today's multicore systems,
while maintaining MPI's rich support for communication inside
clusters.
FG-MPI adds a new dimension to mapping processes onto
nodes. Its flexible process mapping allows granularity of
MPI programs to be adjusted through the command-line to better
fit the cache leading to improved performance.
For communication efficiency, we exploit the locality of MPI
processes in the system and implement optimized communication
between concurrent processes in the same OS-process.
We have investigated scalability issues related to MPI groups and
communicators and defined new efficient algorithms for
communicator creation and storage of process maps.
FG-MPI's light-weight design and ability to expose massive concurrency
enables a task-oriented programming approach that can be used
to simplify MPI programming and avoid some of the non-blocking
communication. The fine-grain nature of FG-MPI makes it
suitable for chips with a large number of cores. As well, it is
based on message-passing and it will be portable to multicore
chips with or without support for cache-coherence.
FG-MPI supports function-level concurrency which enables design of novel
algorithms and techniques to achieve scalable performance and
match the number of processes to the problem rather than the
hardware.