An item is the basic element. Usually it will be a character. A sequence is an ordered set of items. Sequential items in a sequence are said to be logically contiguous. The sequence data structure will keep the items of the sequence in buffers. A buffer is a set of sequentially addressed memory locations. A buffer contains items from the sequence but not necessarily in the same order as they appear logically in the sequence. Sequentially addressed items in a buffer are physically contiguous. When a string of items is physically contiguous in a buffer and is also logically contiguous in the sequence we call them a span. A descriptor is a pointer to a span. In some cases the buffer is actually part of the descriptor and so no pointer is necessary. This variation is not important to the design of the data structures but is more a memory management issue.
Sequence data structures keep spans in buffers and keep enough information (in terms of descriptors and sequences of descriptors) to piece together the spans to form the sequence. Buffers can be kept in memory but most sequence data structures allow buffers to get as large as necessary or allow an unlimited number of buffers. Thus it is necessary to keep the buffers on disk in disk files. Many sequence data structures use buffers of unlimited size, that is, their size is determined by the file contents. This requires the buffer to be a disk file. With enough disk block caching this can be made as fast as necessary.
The concepts of buffers, spans and descriptors can be found in almost every sequence data structure. Sequence data structures vary in terms of how these concepts are used.
If a sequence data structures uses a variable number of descriptors it requires a recursive sequence data structure to keep track of the sequence of descriptors. In section 5 we will look at three sequence data structures that use a fixed number of descriptors and in section 6 we will look at three sequence data structures that use a variable number of descriptors. Section 7 will present a general model of a sequence data structure that encompasses all these data structures.