next up previous
Next: Bibliography Up: Fragmentation Made Friendly Previous: Inefficient Reassembly

Future Research

CPU utilization and memory bandwidth at the endpoints are obviously the bottlenecks in high-speed networks today. Delays have moved from the network to the hosts and, as a result, the efficiency analysis must also. In order to attain high transfer rates with low CPU utilization, a reanalysis of the protocol stack is warranted. McGowan [McG99] suggests applying to the IP stack several techniques from the high-speed networking community (most notably, HIPPI). These include avoiding memory copies, sending large data segments, and allowing fragmentation and reassembly to happen in hardware. These techniques combine to reduce the function calls on sending and the upcalls on receiving IP datagrams. More specifically, there are several changes to IP implementations that we believe will improve throughput on reliable networks while not penalizing throughput on lossy networks:

1.
Send fragments in reverse order. As has been discussed, sending fragments in reverse order allows the receiving end to allocate a buffer of the correct size immediately and efficiently reassemble the fragments by overwriting fragment headers in place.

2.
Send larger datagrams. By allowing the TCP (and conceivably UDP) layers to determine the most efficient datagram size (up to but not exceeding the window size), we should be able to reduce the number of function calls on sending and upcalls on receiving (see figures 3a and 3b). On less-reliable networks, an algorithm should be devised to determine a lower datagram size (as low as the pathMTU).

3.
Optimize reassembly. In addition to optimizing the reverse-order case, reassembly algorithms should attempt to optimize the more-common forward case and implement RFC 815 both for performance and reliability reasons.

4.
Drop packets with out-of-order fragments. Out-of-order fragments are extremely rare and are likely to indicate a datagram that will eventually be dropped. Rejecting the packets immediately may improve end-to-end latencies.

5.
Move fragmentation and reassembly to the NIC. By off-loading this processing onto the network card, the CPU is free to perform other useful work. In combination with larger datagrams, this technique has the potential to dramatically reduce CPU utilization and increase throughput.

Many of these changes represent significant departures from previous IP stack implementations and they should obviously be tested and carefully considered before being deployed. However, by reexamining the assumptions of the role of fragmentation in the IP stack, we may be able to achieve significant efficiencies.

``Fragmentation Considered Harmful'' drastically affected the role of fragmentation in the IP protocol as it is currently implemented. However, we must begin to think of network performance with respect to its new bottleneck, the endpoint. With this in mind, endpoint fragmentation continues to relieve the intermediate network resources of responsibility for fragmentation while beginning to address the CPU utilization bottleneck that is occurring on the hosts.


next up previous
Next: Bibliography Up: Fragmentation Made Friendly Previous: Inefficient Reassembly

2000-07-01