Double Buffer Experiment
This experiment is part of work within the Scalable
Systems Lab , University of New Mexico, under the
guidance of Arthur B. Maccabe
.
Description:
The intent of the experiment is to assess
application/system performance during multiple transfers, from
one node to another, of the contents of two
buffers.
The support node fills a buffer with random
numbers and sends the buffer contents to the
timing node. The support node uses a
non-blocking send of the buffer contents so that the
application may proceed with filling the second
buffer in parallel with the send of
the first buffer.
The timing node receives a buffer, sums all
the values in the buffer, and proceeds to do the
same with the next buffer. The timing
node uses non-blocking receives for the buffers so that
one buffer's contents may be summed in parallel
with receipt of the next buffer.
The timing node monitors the time required
to receive and process the contents of two buffers; this
is one sample. The below statistical
information is based on 20 samples for a given message size.
Details:
Double buffer program
.
Psuedo code of the
heart of the program.
Testing was done on Jemez/Bulks 106 and 107
at the Scalable Systems Lab.
500 MHz Pentium III with a LANai 7.2 Myrinet
NIC.
MPICH/GM data based on GM 1.4 and MPICH/GM
1.2..4;
Portals data based on Cplant cluster running
Portals 3.0 and RTS/CTS kernel modules with a Portals 3.0 port of MPICH
1.2.0.
Each sample is the time to receive and sum
two buffers. Statistics taken from 20 consectuive samples per message
size.
MPICH/GM raw data
.
Portals raw data
.
Matlab plotting script
.
Results:
Here is a postscript file
of the above graph.
Conclusions:
Portals is about 14% faster than MPICH/GM for message sizes greater than 16KB.
This 14% speed up is more significant than
its face value because MPICH/GM has a peak bandwidth greater than 125% of Portals'
on the test system (i.e., MPICH/GM has
at least 1/4 again as much bandwidth). The speed up despite the peak
bandwidth disparity is likely attributable
to Portals having application bypass; unlike MPICH/GM, Portals
does not require frequent MPI library calls
in order to make progress on messages at or above 16 KB.
While MPICH/GM is known to allow an overall
greater degree of OS bypass than the above implementation of Portals,
this
advantage is overcome by the application bypass
in Portals and the lack of the same in MPICH/GM.
The greater spread in the control limits for
Portals is probably due to variability in Portals' communication overhead.
MPICH/GM does not have such overhead.