Last modified:  2-16-2002

Double Buffer Experiment

This experiment is part of work within the  Scalable Systems Lab , University of New Mexico, under the
guidance of Arthur B. Maccabe .

Description:

     The intent of the experiment is to assess application/system performance during multiple transfers, from
     one node to another, of the contents of two buffers.

     The support node fills a buffer with random numbers and sends the buffer contents to the
     timing node.  The support node uses a non-blocking send of the buffer contents so that the
     application may proceed with filling the second buffer in parallel with the send of
     the first buffer.

     The timing node receives a buffer, sums all the values in the buffer, and proceeds to do the
     same with the next buffer.  The timing node uses non-blocking receives for the buffers so that
     one buffer's contents may be summed in parallel with receipt of the next buffer.

     The timing node monitors the time required to receive and process the contents of two buffers; this
     is one sample.  The below statistical information is based on 20 samples for a given message size.

Details:

     Double buffer  program .
     Psuedo code of the heart of the program.
     Testing was done on Jemez/Bulks 106 and 107 at the Scalable Systems Lab.
     500 MHz Pentium III with a LANai 7.2 Myrinet NIC.
     MPICH/GM data based on GM 1.4 and MPICH/GM 1.2..4;
     Portals data based on Cplant cluster running Portals 3.0 and RTS/CTS kernel modules with a Portals 3.0 port of MPICH 1.2.0.
     Each sample is the time to receive and sum two buffers.  Statistics taken from 20 consectuive samples per message size.
     MPICH/GM raw  data .
     Portals raw  data .
     Matlab plotting  script .

Results:


     Here is a postscript file of the above graph.

Conclusions:

     Portals is about 14% faster than MPICH/GM for message sizes greater than 16KB.

     This 14% speed up is more significant than its face value because MPICH/GM has a peak bandwidth greater than 125% of Portals'
     on the test system (i.e., MPICH/GM has at least 1/4 again as much bandwidth).  The speed up despite the peak
     bandwidth disparity is likely attributable to Portals having application bypass; unlike MPICH/GM, Portals
     does not require frequent MPI library calls in order to make progress on messages at or above 16 KB.

     While MPICH/GM is known to allow an overall greater degree of OS bypass than the above implementation of Portals, this
     advantage is overcome by the application bypass in Portals and the lack of the same in MPICH/GM.

     The greater spread in the control limits for Portals is probably due to variability in Portals' communication overhead.
     MPICH/GM does not have such overhead.



Bill Lawry
bill@cs.unm.edu