MPI-SCTP

Using the Stream Control Transmission Protocol (SCTP) for parallel programs
written using the Message Passing Interface (MPI)



| description | sctp | mpi | papers | presentations | technical documents | software | future | people | contact |



Goal: This page documents our experiences with incorporating SCTP and its features into MPI, focusing particularly on the execution of MPI over a network. We hope that our documented experiences here can benefit and interest not only people directly interested in MPI, but also developers of other network applications curious about SCTP.

Description

TCP has always been the de facto transport protocol when network applications are run. This began initially with local area networks and for various network applications, has continued in wide area networks characterized often by higher latency/loss links. Here, TCP has been tuned extensively in order to maximize throughput. The same is true of parallel applications where various socket settings and sysctl settings have been appropriately tweaked.

Previous work using SCTP under high latency/loss links has shown drastic increases in performance over TCP. Initially developed as a solution to signal processing, SCTP has been shown to be useful in other contexts as well such as FTP, HTTP, satellite networks, etc. The question became, why don't we use SCTP to execute latency tolerant parallel program over the Internet, itself having high latency/loss?


Papers

2005

  • Humaira Kamal, Brad Penoff, and Alan Wagner. SCTP-based Middleware for MPI in Wide-Area Networks. In Proceedings of the IEEE Conference on Communication Networks and Services Research (CNSR2005), Halifax, CANADA, May 2005. Full text of the paper is available at IEEE Xplore (Click here)
  • Humaira Kamal, Brad Penoff, and Alan Wagner. Evaluating Transport Level Protocols for MPI in the Internet. In Proceedings of the International Conference on Communications in Computing (CIC 2005), Las Vegas, Nevada USA, June 2005.
  • (PDF | talk) Humaira Kamal, Brad Penoff, and Alan Wagner. SCTP versus TCP for MPI. Proceedings of Supercomputing 2005 (SC2005), Seattle, Washington USA, November 2005. Best Student Paper Award finalist (top 4 of 60+).

  • 2006

  • (PDF) Humaira Kamal, Brad Penoff, Mike Tsai, Edith Vong, Alan Wagner. Using SCTP to hide latency in MPI programs. Accepted to HCW 2006 and to appear in the Proceedings for IPDPS 2006, Rhodes, GREECE, April 2006.
  • (PDF) Brad Penoff and Alan Wagner. Towards MPI progression layer elimination with TCP and SCTP. Accepted to HIPS 2006 and to appear in the Proceedings for IPDPS 2006, Rhodes, GREECE, April 2006.

  • 2007

  • (PDF) Brad Penoff, Mike Tsai, Janardhan Iyengar, and Alan Wagner. Using CMT in SCTP-based MPI to exploit multiple interfaces in cluster nodes. In the Proceedings of EuroPVM/MPI 2007, Paris, FRANCE, Sept 2007.
    2008

  • (PDF|talk) Mike Tsai, Brad Penoff, and Alan Wagner. A Hybrid MPI Design using SCTP and iWARP. In Communication Architecture for Clusters (CAC): Proceedings of the 2008 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Miami, Florida, USA, April 2008.

  • 2009

  • (PDF |talk) Brad Penoff, Alan Wagner, Michael Tuexen, and Irene Ruengeler. MPI-NeTSim: A network simulation module for MPI. In the 15th IEEE International Conference on Parallel and Distributed Systems (ICPADS'09) , Shenzhen, CHINA, December 2009.

  • 2010

  • (DOI) Brad Penoff, Humaira Kamal, Alan Wagner, Mike Tsai, Karol Mroz, and Janardhan Iyengar. Employing Transport Layer Multi-railing in Cluster Networks. In the Journal of Parallel and Distributed Computing (JPDC) Volume 70, Issue 3, March 2010, Pages 259-269.


  • 2011

  • (PDF (to appear)) Irene Ruengeler, Michael Tuexen, Brad Penoff, and Alan Wagner. A New Fast Algorithm for Connecting the INET Simulation Framework to Applications in Real-time. In the fourth International ICST Conference on Simulation Tools and Techniques (SIMUTools 2011), March 2011.


  • 2012

  • (PDF) Irene Ruengeler, Michael Tuexen, Brad Penoff, and Alan Wagner. Portable and Performant Userspace SCTP Stack. In the IEEE International Conference on Computer Communication Networks (ICCCN 2012), July 2012.



  • Presentations

  • CNSR2005, Halifax, Canada - May 2005 (PowerPoint)
  • Ohio State visit, Columbus, Ohio - Oct 24, 2005 (PowerPoint)
  • SC|05, Seattle, Washington, USA - Nov 16, 2005 (PowerPoint)
  • VanHPC meeting, Vancouver, BC, CANADA - March 15, 2006 (PDF, PowerPoint)
  • HCW 2006, Rhodes, GREECE - April 25, 2006 (PowerPoint)
  • HIPS 2006, Rhodes, GREECE - April 25, 2006 (PowerPoint)
  • Argonne National Laboratory, Chicago, Illinois, USA - September 7, 2006
  • Google, Seattle, Washington, USA - June 23, 2007 (PDF)(Google Video|YouTube|embedded)
  • Euro PVM/MPI 2007, Paris, FRANCE - October 1, 2007
  • CAC 2008, Miami, Florida, USA - April 14, 2008 (PowerPoint)
  • ICPADS 2009, Shenzhen, CHINA - December 11, 2009 (PDF)


    Technical Documents

  • Description of LAM TCP RPI Module (PDF)

  • Future Events

  • ICCCN 2012

  • Past Events

  • SCTP Interop. Event (July 30st- August 4th, 2006) (WEBSITE)
  • The standard MPICH2 1.0.5 release includes our SCTP channel for MPICH2's ch3 device.

  • Software

    Open MPI
    Open MPI On Nov 13, 2007, the initial SCTP BTL was committed to ompi-trunk during changeset 16723.

    MPICH2
    MPICH2 1.0.5 will include an SCTP channel for MPICH2's ch3 device. Directions for using and compiling this are provided in the README.

    LAM/MPI
    Our initial prototype discussed in our SC|05 paper was implemented within LAM/MPI. Our modified LAM/MPI used in this paper is now available:

  • LAM/MPI 7.0.6 with SCTP - Email us for the tarball!
  • LAM/MPI when run using the TCP RPI provides concurrency at the process level since each pair of processes have a socket on each side associated with their connection. TCP provides a full ordering on these connections, however this is more strict than those required by MPI. MPI requires that messages passed with the same tag, rank, and context (TRC) maintain ordering. Our implementation provides this TRC level concurrency.

    Although contrived, the following simple example program illustrates how head-of-line blocking can occur in TCP-based RPIs, something our SCTP-based middleware avoids. The same communication pattern of this latency tolerant program could conceivably be present in real applications. This program is a way to implement Figure 5 in our SC|05 paper.

  • waitany - download the example program here

  • Future Work

    We want to have several topics that we want to investigate in the future. A small subset of them include:


    People

  • Humaira Kamal (web)
  • Brad Penoff (web)
  • Alan Wagner (web)
  • Mike Yao Chen Tsai
  • Edith Vong

  • Contact

    If you have questions regarding anything, feel free to email us at mpi hyphen sctp at cs dot ubc dot ca.


    Last Updated: Sept 1, 2008