FG-MPI: Fine-Grain MPI
FG-MPI adds a new dimension to mapping processes onto nodes, which can be used to better match the granularity of processes to better fit the cache and improve the performance of existing algorithms. For communication efficiency, we exploit the locality of MPI processes in the system and implement optimized communication between concurrent processes in the same OS-process. On a multicore machine we have shown that FG-MPI achieves good performance (equal to or better) than other multicore parallel languages and runtime systems. FG-MPI provides a vehicle to investigate issues related to scalability of the MPI middleware without requiring the corresponding number of cores or machines. We have investigated scalability issues related to MPI groups and communicators and defined new efficient algorithms for communicator creation and storage of process maps. FG-MPI's light-weight design and ability to expose massive concurrency enables a task-oriented programming approach that can be used to simplify MPI programming and avoid some of the non-blocking communication. The fine-grain nature of FG-MPI makes it suitable for chips with a large number of cores. As well, it is based on message-passing and it will be portable to multicore chips with or without support for cache-coherence.
Papers and talks related to FG-MPI are here