![]() (home) |
Last Modified: 12 March 2001 |
SOURCE: | Created by John McCalpin while he was a professor at the University of Delaware. |
PURPOSE: | CPU speeds are increasing at a much faster rate than computer memory systems. The STREAM benchmark can help quantify the difference. |
MEASURE: | Sustainable memory bandwidth in MB/s. |
PLATFORMS: | DOS, Windows95/98/NT, Linux, Power Mac; uniprocessors and multiprocessors. This tool evaluation focused on uniprocessors under the Linux OS. |
INTERNALS: | The C version (see "NOTES") of
STREAM counts time to manipulate doubles via using three arrays("a", "b"
and "c"). There are four separate tests: COPY, SCALE, ADD,
and TRIAD. Each test is individually repeated 10 times and
the four tests are ran in series without repeating. The tests
are defined as follows:
COPY > c[j] = a[j]. SCALE > b[j] = scalar * c[j]. ADD > c[j] = a[j] + b[j]. TRIAD > a[j] = b[j] + scalar * c[j]. |
USE: | The size of the arrays determines the RAM used. The user sets the array size and the size applies to all arrays. The array size recommended by the author is four times the last level cache or 1,000,000, whichever is larger. The idea is to focus on memory bandwidth while minimizing the effect of cache bandwidth. The array size is set in the source code. The source code should be compiled using the compiler's full optimization flag (Makefile). Omitting the optimization flag on jemez.cs.unm.edu resulted in decreased bandwidths by roughly 20%. |
SAMPLE DATA: | See this to view the formatted output from a STREAM run. Alternatively, the below chart depicts data gathered from multiple runs on a test machine. In the chart, the cache effect is evident at RAM levels less than 5 MB; at the other extrememe, performance seriously degrades at around 23 MB probably due to page thrashing. Note that the test "Add" has greater sustainable memory bandwidth than the other three tests by roughly 10%. Although an explanation has not been sought, these results are typical for many processors including other AMD K6-II processors (see results at STREAM web pages). |
NOTES: | There are two versions of the benchmark
source code. The version presented here is in C and is written for
computational kernels written in C. The benchmark is also written
in Fortran which is specifically for computational kernels coded in Fortran.
See relevant comments in the source code (local copies are here: C
code,
C
timer,
Fortran
code , and
Fortran
timer).
The Fortran benchmark is dated July 2000 and contains features not available in the C benchmark dated October 1995. See STREAM web pages for original comments on features. Some of the Fortran benchmark features might be useful in the C benchmark; for example, the Fortran benchmark excludes the first and last iterations of a test when looking for minimum times and the Fortran benchmark has a subroutine to "validate" results. |