Site Navigation
Introduction
Projects
Activities
People
Links
Affiliations
|
The Portals 3.0 API
Requirements
The following are the requirements for the Cplant
message passing system. These requirements are divided
into three sections. The first section lists critical
requirements that the system must meet in order to meet
the goals of the Cplant project. The second section
contains important requirements that the system will meet.
These requirements address specific functionality that
will be supported above and beneath the implementation
layer. The third section lists requirements that the
system should meet. These requirements provide a set of
criteria that need to be considered when establishing the
relationship of the system to other message passing
technologies.
The "Must" Requirements
- [R1.1] Message Passing Protocols
The system must support efficient implementations
of commonly used message passing protocols.
Discussion: Cplant applications will need
to use a variety of message passing protocols,
possibly including: MPI, sockets, and the protocols
currently being used by the Cplant runtime
environment.
- [R1.2] Portability
The system must be implementable on a variety of
existing network interfaces.
Discussion:It is anticipated that Cplant
systems will be constructed from different types of
hardware and access to low-level software may be
limited. Also, implementations based on commonly
available interfaces provides a layer of portability.
- [R1.3] Scalability
The system must support efficient implementations
for systems with thousands of nodes.
Discussion: It is anticipated that Cplant
system with thousands of nodes will be constructed.
- [R1.4] Performance
It must be possible to create high-performance
implementations of the system on existing hardware.
Discussion: Cplant applications need high
performance message passing primitives. This is a
corollary requirement to R1.3.
- [R1.5] Multiprocess
The system must support access to the network
from multiple processes per node.
Discussion: The Cplant runtime environment
requires multiple communicating processes per node.
- [R1.6] Communication between processes from
different executables
The system must support the ability to pass
messages between processes instantiated from different
executables.
Discussion: Support for this client/server
type of communication is a requirement for the Cplant
runtime environment.
- [R1.7] Runtime independence
The ability to perform message passing must not
depend on the existence of an external runtime
environment, scheduling mechanism, or other special
utilities outside of normal UNIX process startup.
Discussion: Support for runtime
independence is a requirement of the Cplant runtime
environment.
- [R1.8] Memory Protection
The system must support memory protection by
insuring that a process can only access memory that it
owns.
Discussion: Memory protection prevents a
process from accessing other process' memory in the
NIC or in host memory by modifying programmable NICs.
- [R1.9] Reliable message delivery
The system must support reliable delivery of
messages.
Discussion: Support for reliable message
passing is a fundamental requirement.
- [R1.10] Pairwise message ordering
The system must support pairwise ordering of
messages.
Discussion: Messages should be received in
the order sent.
The "Will" Requirements
- [R2.1] Operational API
The system API will be defined by operations, not
modifications to data structures. This means that the
interface will have explicit operations to send and
receive messages. (It does not mean that the receive
operation will involve a copy of the message body.)
Discussion: The goal is to make sure that
implementations will be given an opportunity to take
action whenever the application code does something
significant.
- [R2.2] Zero-copy MPI
It will be possible to write a low-latency,
zero-copy, scalable implementation of the
point-to-point operations in MPI 1 using the system.
Discussion: We anticipate that many high
performance applications will be written using MPI.
Moreover, we anticipate that these applications will
primarily use the point-to-point operations (See
R1.1.).
- [R2.3] Myrinet
It will be possible to write an efficient
implementation using Linux as the host OS and Myrinet
interface cards.
Discussion: Current configurations of
Cplant are based on Linux with Myrinet interface
cards. The implementation should take advantage of
the programming interface provided for these cards.
- [R2.4] Sockets Implementation
It will be possible to write an implementation
based on the sockets API.
Discussion: The sockets API is available
in most networking environments (including Unix and
NT). This requirement is directly related to the
portability requirement (R1.2).
- [R2.5] Message Size
The system will not impose an arbitrary
restriction on the size of message that can be sent.
Discussion: This requirement is based on
the ``zero, one, infinity'' rule which holds that the
only reasonable numbers in an interface requirement
are zero, one, and infinity. Obviously, the first two
would be unreasonable.
- [R2.6] OS Bypass
The system will support an OS bypass message
passing strategy. That is, high performance
implementations of the message passing mechanisms will
be able to bypass the OS and deliver messages directly
to the application.
Discussion: Involving the OS on every
message transmission can degrade performance.
- [R2.7] Put/Get
The system will support remote put/get
operations.
Discussion: These operations are becoming
common in many API's (including MPI-2) and support for
them seems natural.
- [R2.8] Packets
It will be possible to write efficient
implementations that packetize message transmission.
Discussion: This requirement is based on
two considerations: underlying network requirements
and fairness in network access. Some networks, e.g.,
Ethernet, impose a restriction on the amount of data
that can be sent is a single transmission. Because
implementations cannot impose a limit on message size
(R2.5), it will be necessary to packetize messages in
some implementations. Second, many of the modern
networks use wormhole routing to establish a
point-to-point connection. In this scheme, the switch
ports used in the connection are locked until the data
is fully transmitted. In an effort to ensure fair use
of the network, some implementations may need to
packetize messages.
- [R2.9] Receive Operation
The receive operation will use an address and
length pair to specify where the message body should
be placed.
Discussion: The alternative is to have the
receive operation return the location and length of
the message--the application could then process the
message where ``in place.''
- [R2.10] Receiver managed communication
The system will support receive-side management
of message space and this management will be performed
during message receipt.
Discussion: This requirement is based on
the need for scalability (R1.3), the need for
performance (R1.4), and the need to support an
efficient implementation of MPI (R2.2).
- [R2.11] Sender managed communication
The system will support send-side management of
message space.
Discussion: Send-side management is needed
to support parallel servers that stripe data.
- [R2.12] Gateways
It will be possible to write gateway processes.
A gateway process is a process that receives messages
from one implementation of and transmits them to
another implementation running on different networking
hardware.
Discussion: Gateways are needed to
connect physically distributed portions of Cplant
where the WAN may use different hardware than the SAN.
- [R2.13] Asynchronous operations
The system will support the ability to overlap
computation and communication with the use of
asynchronous send and receive operations. These
operations will allow applications that use the system
directly as well as the MPI implementation the ability
to efficiently overlap computation and communication.
Discussion: The ability to support
asynchronous data movement operations is key to
achieving high-performance communication while still
allowing for compute cycles to be delivered to the
application. In order for overlapping to be
effective, multiple outstanding send and receive
operations need to be supported.
- [R2.14] Threads
The system will support multithreading.
Discussion: The ability of the system to
support multiple threads of execution within a single
address space is both a portability requirement (R1.2)
for symmetric multiprocessor nodes and a requirement
for supporting multithreaded applications.
The "Should" Requirements
- [R3.1] Message Alignment
The system should not impose any restrictions
regarding the alignment of the address(es) used to
specify the contents of a message.
Discussion: Not having alignment
complicates implementations of higher level interfaces
and adding handling unaligned data is pretty easy.
- Higher level interfaces (e.g., MPI and sockets)
do not impose any alignment restrictions. If the
system does impose alignment restrictions,
implementations for each of these interfaces will
need to include code that deals with unaligned data.
- If unaligned data cannot be handled easily
during the transmission or receipt of messages
(e.g., by the network interface cards), it can be
handled by copying the message body in the explicit
send or receive operation provided (R2.1).
- [R3.2] Striping
The system should be able to take advantage of
multiple interfaces on a single logical network to
improve the bandwidth
Discussion: This is an efficiency issue.
- [R3.3] Socket API
The system should support and efficient
implementation of sockets (including UDP, TCP and IP).
Discussion: Many Cplant applications will
benefit from the ability to communicate with existing
applications that use a socket interface.
- [R3.4] Scheduled Transfer
It should be possible to write an efficient
implementation based on Scheduled Transfer (ST).
Discussion: ST is an emerging
network-level protocol and API that supports OS
bypass. We expect that some manufacturers will
provide an OS/interface implementation of ST. This
requirement is a result of the portability requirement
(R1.2).
- [R3.5] Virtual Interface Architecture
It should be possible to write an efficient
implementation based on the Virtual Interface
Architecture (VIA).
Discussion: VIA is an emerging interface
that supports OS bypass. We expect that some
manufacturers will provide an OS/interface
implementation of VIA. This requirement is a result
of the portability requirement (R1.2).
- [R3.6] Internetwork Consistency
The system should not impose any consistency
requirements across multiple networks/interfaces. In
particular, there will not be any memory
consistency/coherency requirements when messages
arrive on independent paths.
Discussion: Consistency requirements would
limit our ability to implement the API on the
interface cards as these implementations would need to
coordinate their activities to meet the consistency
requirements.
- [R3.7] Ease of Use
Programming the system should be no more complex
than programming traditional message passing
environments like UNIX sockets or MPI. An in-depth
understanding of the implementation or access to
implementation-level information should not be
required.
Discussion: Some previous generations of
message passing systems for massively parallel
machines were difficult to program and system-level
debugging was usually required to insure correctness.
- [R3.8] Topology information
The system should provide a way for a process to
discover how ``close'' another process is in the
topology of the network.
Discussion: This is needed to facilitate
nearest neighbor communication.
|