SSL Logo

Site Navigation

Introduction
Projects
Activities
People
Links


Affiliations

CS Department Logo

AHPCC Logo

Sandia Logo

UNM Logo

Sandia
		      National Laboratories Cplant

The Portals 3.0 API

Requirements

The following are the requirements for the Cplant message passing system. These requirements are divided into three sections. The first section lists critical requirements that the system must meet in order to meet the goals of the Cplant project. The second section contains important requirements that the system will meet. These requirements address specific functionality that will be supported above and beneath the implementation layer. The third section lists requirements that the system should meet. These requirements provide a set of criteria that need to be considered when establishing the relationship of the system to other message passing technologies.

The "Must" Requirements

  1. [R1.1] Message Passing Protocols
    The system must support efficient implementations of commonly used message passing protocols.

    Discussion: Cplant applications will need to use a variety of message passing protocols, possibly including: MPI, sockets, and the protocols currently being used by the Cplant runtime environment.

  2. [R1.2] Portability
    The system must be implementable on a variety of existing network interfaces.

    Discussion:It is anticipated that Cplant systems will be constructed from different types of hardware and access to low-level software may be limited. Also, implementations based on commonly available interfaces provides a layer of portability.

  3. [R1.3] Scalability
    The system must support efficient implementations for systems with thousands of nodes.

    Discussion: It is anticipated that Cplant system with thousands of nodes will be constructed.

  4. [R1.4] Performance
    It must be possible to create high-performance implementations of the system on existing hardware.

    Discussion: Cplant applications need high performance message passing primitives. This is a corollary requirement to R1.3.

  5. [R1.5] Multiprocess
    The system must support access to the network from multiple processes per node.

    Discussion: The Cplant runtime environment requires multiple communicating processes per node.

  6. [R1.6] Communication between processes from different executables
    The system must support the ability to pass messages between processes instantiated from different executables.

    Discussion: Support for this client/server type of communication is a requirement for the Cplant runtime environment.

  7. [R1.7] Runtime independence
    The ability to perform message passing must not depend on the existence of an external runtime environment, scheduling mechanism, or other special utilities outside of normal UNIX process startup.

    Discussion: Support for runtime independence is a requirement of the Cplant runtime environment.

  8. [R1.8] Memory Protection
    The system must support memory protection by insuring that a process can only access memory that it owns.

    Discussion: Memory protection prevents a process from accessing other process' memory in the NIC or in host memory by modifying programmable NICs.

  9. [R1.9] Reliable message delivery
    The system must support reliable delivery of messages.

    Discussion: Support for reliable message passing is a fundamental requirement.

  10. [R1.10] Pairwise message ordering
    The system must support pairwise ordering of messages.

    Discussion: Messages should be received in the order sent.

The "Will" Requirements

  1. [R2.1] Operational API
    The system API will be defined by operations, not modifications to data structures. This means that the interface will have explicit operations to send and receive messages. (It does not mean that the receive operation will involve a copy of the message body.)

    Discussion: The goal is to make sure that implementations will be given an opportunity to take action whenever the application code does something significant.

  2. [R2.2] Zero-copy MPI
    It will be possible to write a low-latency, zero-copy, scalable implementation of the point-to-point operations in MPI 1 using the system.

    Discussion: We anticipate that many high performance applications will be written using MPI. Moreover, we anticipate that these applications will primarily use the point-to-point operations (See R1.1.).

  3. [R2.3] Myrinet
    It will be possible to write an efficient implementation using Linux as the host OS and Myrinet interface cards.

    Discussion: Current configurations of Cplant are based on Linux with Myrinet interface cards. The implementation should take advantage of the programming interface provided for these cards.

  4. [R2.4] Sockets Implementation
    It will be possible to write an implementation based on the sockets API.

    Discussion: The sockets API is available in most networking environments (including Unix and NT). This requirement is directly related to the portability requirement (R1.2).

  5. [R2.5] Message Size
    The system will not impose an arbitrary restriction on the size of message that can be sent.

    Discussion: This requirement is based on the ``zero, one, infinity'' rule which holds that the only reasonable numbers in an interface requirement are zero, one, and infinity. Obviously, the first two would be unreasonable.

  6. [R2.6] OS Bypass
    The system will support an OS bypass message passing strategy. That is, high performance implementations of the message passing mechanisms will be able to bypass the OS and deliver messages directly to the application.

    Discussion: Involving the OS on every message transmission can degrade performance.

  7. [R2.7] Put/Get
    The system will support remote put/get operations.

    Discussion: These operations are becoming common in many API's (including MPI-2) and support for them seems natural.

  8. [R2.8] Packets
    It will be possible to write efficient implementations that packetize message transmission.

    Discussion: This requirement is based on two considerations: underlying network requirements and fairness in network access. Some networks, e.g., Ethernet, impose a restriction on the amount of data that can be sent is a single transmission. Because implementations cannot impose a limit on message size (R2.5), it will be necessary to packetize messages in some implementations. Second, many of the modern networks use wormhole routing to establish a point-to-point connection. In this scheme, the switch ports used in the connection are locked until the data is fully transmitted. In an effort to ensure fair use of the network, some implementations may need to packetize messages.

  9. [R2.9] Receive Operation
    The receive operation will use an address and length pair to specify where the message body should be placed.

    Discussion: The alternative is to have the receive operation return the location and length of the message--the application could then process the message where ``in place.''

  10. [R2.10] Receiver managed communication
    The system will support receive-side management of message space and this management will be performed during message receipt.

    Discussion: This requirement is based on the need for scalability (R1.3), the need for performance (R1.4), and the need to support an efficient implementation of MPI (R2.2).

  11. [R2.11] Sender managed communication
    The system will support send-side management of message space.

    Discussion: Send-side management is needed to support parallel servers that stripe data.

  12. [R2.12] Gateways
    It will be possible to write gateway processes. A gateway process is a process that receives messages from one implementation of and transmits them to another implementation running on different networking hardware.

    Discussion: Gateways are needed to connect physically distributed portions of Cplant where the WAN may use different hardware than the SAN.

  13. [R2.13] Asynchronous operations
    The system will support the ability to overlap computation and communication with the use of asynchronous send and receive operations. These operations will allow applications that use the system directly as well as the MPI implementation the ability to efficiently overlap computation and communication.

    Discussion: The ability to support asynchronous data movement operations is key to achieving high-performance communication while still allowing for compute cycles to be delivered to the application. In order for overlapping to be effective, multiple outstanding send and receive operations need to be supported.

  14. [R2.14] Threads
    The system will support multithreading.

    Discussion: The ability of the system to support multiple threads of execution within a single address space is both a portability requirement (R1.2) for symmetric multiprocessor nodes and a requirement for supporting multithreaded applications.

The "Should" Requirements

  1. [R3.1] Message Alignment
    The system should not impose any restrictions regarding the alignment of the address(es) used to specify the contents of a message.

    Discussion: Not having alignment complicates implementations of higher level interfaces and adding handling unaligned data is pretty easy.

    1. Higher level interfaces (e.g., MPI and sockets) do not impose any alignment restrictions. If the system does impose alignment restrictions, implementations for each of these interfaces will need to include code that deals with unaligned data.
    2. If unaligned data cannot be handled easily during the transmission or receipt of messages (e.g., by the network interface cards), it can be handled by copying the message body in the explicit send or receive operation provided (R2.1).

  2. [R3.2] Striping
    The system should be able to take advantage of multiple interfaces on a single logical network to improve the bandwidth

    Discussion: This is an efficiency issue.

  3. [R3.3] Socket API
    The system should support and efficient implementation of sockets (including UDP, TCP and IP).

    Discussion: Many Cplant applications will benefit from the ability to communicate with existing applications that use a socket interface.

  4. [R3.4] Scheduled Transfer
    It should be possible to write an efficient implementation based on Scheduled Transfer (ST).

    Discussion: ST is an emerging network-level protocol and API that supports OS bypass. We expect that some manufacturers will provide an OS/interface implementation of ST. This requirement is a result of the portability requirement (R1.2).

  5. [R3.5] Virtual Interface Architecture
    It should be possible to write an efficient implementation based on the Virtual Interface Architecture (VIA).

    Discussion: VIA is an emerging interface that supports OS bypass. We expect that some manufacturers will provide an OS/interface implementation of VIA. This requirement is a result of the portability requirement (R1.2).

  6. [R3.6] Internetwork Consistency
    The system should not impose any consistency requirements across multiple networks/interfaces. In particular, there will not be any memory consistency/coherency requirements when messages arrive on independent paths.

    Discussion: Consistency requirements would limit our ability to implement the API on the interface cards as these implementations would need to coordinate their activities to meet the consistency requirements.

  7. [R3.7] Ease of Use
    Programming the system should be no more complex than programming traditional message passing environments like UNIX sockets or MPI. An in-depth understanding of the implementation or access to implementation-level information should not be required.

    Discussion: Some previous generations of message passing systems for massively parallel machines were difficult to program and system-level debugging was usually required to insure correctness.

  8. [R3.8] Topology information
    The system should provide a way for a process to discover how ``close'' another process is in the topology of the network.

    Discussion: This is needed to facilitate nearest neighbor communication.


Maintained by: Barney Maccabe       Last modified: Fri Jun 16 11:37:36 MDT 2000