Current implementations of MPI are coarse-grained, with a single MPI process per processor, however, there is nothing in the MPI specification precluding a finer-grain interpretation of the standard. We have implemented Fine-grain MPI (FG-MPI), a system that allows execution of hundreds and thousands of MPI processes on one node or communicating between nodes inside a cluster. FG-MPI uses fibers (coroutines) to support multiple MPI processes inside an operating system process. These are full-fledged MPI processes each with their own MPI rank. FG-MPI is based on MPICH2 middleware and uses the Nemesis communication subsystem for intra-node and inter-node communication.
We present experimental results for applications using thousands of MPI processes and compare its performance with several fine-grain multicore languages. FG-MPI also made it possible to investigate problems related to scaling of MPI to a larger number of processes. We have also designed and evaluated techniques to support the scalability of communicators and groups in MPI.
Papers and talks related to FG-MPI are here