123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221 |
- <!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN">
- <html>
- <head>
- <meta http-equiv="Content-Language" content="en-us">
- <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
- <title>Venti: a new approach to archival storage</title>
- </head>
- <body bgcolor="white">
- <h1>Venti: a new approach to archival storage</h1>
- <p>
- Sean Quinlan and Sean Dorward
- <br>
- Bell Labs, Lucent Technologies
- <p>
- <h1>Abstract</h1>
- <p>
- This paper describes a network storage system, called Venti, intended
- for archival data. In this system, a unique hash of a block's
- contents acts as the block identifier for read and write operations.
- This approach enforces a write-once policy, preventing accidental or
- malicious destruction of data. In addition, duplicate copies of a
- block can be coalesced, reducing the consumption of storage and
- simplifying the implementation of clients. Venti is a building block
- for constructing a variety of storage applications such as logical
- backup, physical backup, and snapshot file systems.
- <p>
- We have built a prototype of the system and present some preliminary
- performance results. The system uses magnetic disks as the storage
- technology, resulting in an access time for archival data that is
- comparable to non-archival data. The feasibility of the write-once
- model for storage is demonstrated using data from over a decade's use
- of two Plan 9 file systems.
- <p>
- <h1>1. Introduction</h1>
- <p>
- Archival storage is a second class citizen. Many computer
- environments provide access to a few recent versions of the
- information stored in file systems and databases, though this access
- can be tedious and may require the assistance of a system
- administrator. Less common is the ability for a user to examine data
- from last month or last year or last decade. Such a feature may not
- be needed frequently, but when it is needed it is often crucial.
- <p>
- The growth in capacity of storage technologies exceeds the ability of
- many users to generate data, making it practical to archive data in
- perpetuity. Plan 9, the computing environment that the authors use,
- includes a file system that stores archival data to an optical jukebox
- [16, 17]. Ken Thompson observed that, for our usage patterns, the
- capacity of the jukebox could be considered infinite. In the time it
- took for us to fill the jukebox, the improvement in technology would
- allow us to upgrade to a new jukebox with twice the capacity.
- <p>
- Abundant storage suggests that an archival system impose a write-once
- policy. Such a policy prohibits either a user or administrator from
- deleting or modifying data once it is stored. This approach greatly
- reduces the opportunities for accidental or malicious data loss and
- simplifies the system's implementation.
- <p>
- Moreover, our experience with Plan 9 is that a write-once policy
- changes the way one views storage. Obviously, some data is temporary,
- derivative, or so large that it is either undesirable or impractical
- to retain forever and should not be archived. However, once it is
- decided that the data is worth keeping, the resources needed to store
- the data have been consumed and cannot be reclaimed. This eliminates
- the task of periodically "cleaning up" and deciding whether the data
- is still worth keeping. More thought is required before storing the
- data to a write-once archive, but as the cost of storage continues to
- fall, this becomes an easy decision.
- <p>
- This paper describes the design and implementation of an archival
- server, called Venti. The goal of Venti is to provide a write-once
- archival repository that can be shared by multiple client machines and
- applications. In addition, by using magnetic disks as the primary
- storage technology, the performance of the system approaches that of
- non-archival storage.
- <p>
- <h1>2. Background</h1>
- <p>
- A prevalent form of archival storage is the regular backup of data to
- magnetic tape [15]. A typical scenario is to provide backup as a
- central service for a number of client machines. Client software
- interfaces with a database or file system and determines what data to
- back up. The data is copied from the client to the tape device, often
- over a network, and a record of what was copied is stored in a catalog
- database.
- <p>
- Restoring data from a tape backup system can be tedious and error
- prone. The backup system violates the access permission of the file
- system, requiring a system administrator or privileged software to
- perform the task. Since they are tedious, restore operations are
- infrequent and problems with the process may go undetected. Potential
- sources of error abound: tapes are mislabeled or reused or lost,
- drives wander out of alignment and cannot read their old tapes,
- technology becomes obsolete.
- <p>
- For tape backup systems, a tradeoff exists between the performance of
- backup and restore operations [1]. A full backup simplifies the
- process of restoring data since all the data is copied to a continuous
- region on the tape media. For large file systems and databases,
- incremental backups are more efficient to generate, but such backups
- are not self-contained; the data for a restore operation is scattered
- across multiple incremental backups and perhaps multiple tapes. The
- conventional solution is to limit the extent of this scattering by
- performing a full backup followed by a small number of incremental
- backups.
- <p>
- File systems such as Plan 9 [16, 17], WAFL [5], and AFS [7] provide a
- more unified approach to the backup problem by implementing a snapshot
- feature. A snapshot is a consistent read-only view of the file system
- at some point in the past. The snapshot retains the file system
- permissions and can be accessed with standard tools (ls, cat, cp,
- grep, diff) without special privileges or assistance from an
- administrator. In our experience, snapshots are a relied-upon and
- frequently-used resource because they are always available and easy to
- access.
- <p>
- Snapshots avoid the tradeoff between full and incremental backups.
- Each snapshot is a complete file system tree, much like a full backup.
- The implementation, however, resembles an incremental backup because
- the snapshots and the active file system share any blocks that remain
- unmodified; a snapshot only requires additional storage for the blocks
- that have changed. To achieve reasonable performance, the device that
- stores the snapshots must efficiently support random access, limiting
- the suitability of tape storage for this approach.
- <p>
- In the WAFL and AFS systems, snapshots are ephemeral; only a small
- number of recent versions of the file system are retained. This
- policy is reasonable since the most recent versions of files are the
- most useful. For these systems, archival storage requires an
- additional mechanism such as tape backup.
- <p>
- The philosophy of the Plan 9 file system is that random access storage
- is sufficiently cheap that it is feasible to retain snapshots
- permanently. The storage required to retain all daily snapshots of a
- file system is surprisingly modest; later in the paper we present
- statistics for two file servers that have been in use over the last 10
- years.
- <p>
- Like Plan 9, the Elephant file system [18] retains many versions of
- data. This system allows a variety of storage reclamation policies
- that determine when a version of a file should be deleted. In
- particular, "landmark" versions of files are retained permanently and
- provide an archival record.
- <p>
- <h1>3. The Venti Archival Server</h1>
- <p>
- Venti is a block-level network storage system intended for archival
- data. The interface to the system is a simple protocol that enables
- client applications to read and write variable sized blocks of data.
- Venti itself does not provide the services of a file or backup system,
- but rather the backend archival storage for these types of
- applications.
- <p>
- Venti identifies data blocks by a hash of their contents. By using a
- collision-resistant hash function with a sufficiently large output, it
- is possible to consider the hash of a data block as unique. Such a
- unique hash is called the fingerprint of a block and can be used as
- the address for read and write operations. This approach results in a
- storage system with a number of interesting properties.
- <p>
- As blocks are addressed by the fingerprint of their contents, a block
- cannot be modified without changing its address; the behavior is
- intrinsically write-once. This property distinguishes Venti from most
- other storage systems, in which the address of a block and its
- contents are independent.
- <p>
- Moreover, writes are idempotent. Multiple writes of the same data can
- be coalesced and do not require additional storage space. This
- property can greatly increase the effective storage capacity of the
- server since it does not rely on the behavior of client applications.
- For example, an incremental backup application may not be able to
- determine exactly which blocks have changed, resulting in unnecessary
- duplication of data. On Venti, such duplicate blocks will be
- discarded and only one copy of the data will be retained. In fact,
- replacing the incremental backup with a full backup will consume the
- same amount of storage. Even duplicate data from different
- applications and machines can be eliminated if the clients write the
- data using the same block size and alignment.
- <p>
- The hash function can be viewed as generating a universal name space
- for data blocks. Without cooperating or coordinating, multiple
- clients can share this name space and share a Venti server. Moreover,
- the block level interface places few restrictions on the structures
- and format that clients use to store their data. In contrast,
- traditional backup and archival systems require more centralized
- control. For example, backup systems include some form of job
- scheduler to serialize access to tape devices and may only support a
- small number of predetermined data formats so that the catalog system
- can extract pertinent meta-data.
- <p>
- Venti provides inherent integrity checking of data. When a block is
- retrieved, both the client and the server can compute the fingerprint
- of the data and compare it to the requested fingerprint. This
- operation allows the client to avoid errors from undetected data
- corruption and enables the server to identify when error recovery is
- necessary.
- <p>
- Using the fingerprint of a block as its identity facilitates features
- such as replication, caching, and load balancing. Since the contents
- of a particular block are immutable, the problem of data coherency is
- greatly reduced; a cache or a mirror cannot contain a stale or out of
- date version of a block.
- <p>
- <h2>3.1. Choice of Hash Function</h2>
- <p>
- The design of Venti requires a hash function that generates a unique
- fingerprint for every data block that a client may want to store.
- Obviously, if the size of the fingerprint is smaller than the size of
- the data blocks, such a hash function cannot exist since there are
- fewer possible fingerprints than blocks. If the fingerprint is large
- enough and randomly distributed, this problem does not arise in
- practice. For a server of a given capacity, the likelihood that two
- different blocks will have the same hash value, also known as a
- collision, can be determined. If the probability of a collision is
- vanishingly small, we can be confident that each fingerprint is
- unique.
- <p>
- It is desirable that Venti employ a cryptographic hash function. For
- such a function, it is computationally infeasible to find two distinct
- inputs that hash to the same value [10]. This property is important
- because it prevents a malicious client from intentionally creating
- blocks that violate the assumption that each block has a unique
- fingerprint. As an additional benefit, using a cryptographic hash
- function strengthens a client's integrity check, preventing a
- malicious server from fulfilling a read request with fraudulent data.
- If the fingerprint of the returned block matches the requested
- fingerprint, the client can be confident the server returned the
- original data.
- <p>
- Venti uses the Sha1 hash function [13] developed by the US National
- Institute for Standards and Technology (NIST). Sha1 is a popular hash
- algorithm for many security systems and, to date, there are no known
- collisions. The output of Sha1 is a 160 bit (20 byte) hash value.
- Software implementations of Sha1 are relatively efficient; for
- example, a 700Mhz Pentium 3 can compute the Sha1 hash of 8 Kbyte data
- blocks in about 130 microseconds, a rate of 60 Mbytes per second.
- <p>
- Are the 160 bit hash values generated by Sha1 large enough to ensure
- the fingerprint of every block is unique? Assuming random hash values
- with a uniform distribution, a collection of n different data blocks
- and a hash function that generates b bits, the probability p that
- there will be one or more collisions is bounded by the number of pairs
- of blocks multiplied by the probability that a given pair will
- collide, i.e.
- <p>
-
- <img src="probablity.gif" ALT="probablity">
- <p>
- Today, a large storage system may contain a petabyte (10^15 bytes) of data.
- Consider an even larger system that contains an exabyte (10^18 bytes)
- stored as 8 Kbyte blocks (~10^14 blocks). Using the Sha1 hash function, the
- probability of a collision is less than 10^-20. Such a scenario seems
- sufficiently unlikely that we ignore it and use the Sha1 hash as a
- unique identifier for a block. Obviously, as storage technology
- advances, it may become feasible to store much more than an exabyte,
- at which point it maybe necessary to move to a larger hash function.
- NIST has already proposed variants of Sha1 that produce 256, 384, and
- 512 bit results [14]. For the immediate future, however, Sha1 is a
- suitable choice for generating the fingerprint of a block.
- <p>
- <h2>3.2. Choice of Storage Technology</h2>
- <p>
- When the Plan 9 file system was designed in 1989, optical jukeboxes
- offered high capacity with respectable random access performance and
- thus were an obvious candidate for archival storage. The last decade,
- however, has seen the capacity of magnetic disks increase at a far
- faster rate than optical technologies [20]. Today, a disk array costs
- less than the equivalent capacity optical jukebox and occupies less
- physical space. Disk technology is even approaching tape in cost per
- bit.
- <p>
- Magnetic disk storage is not as stable or permanent as optical media.
- Reliability can be improved with technology such as RAID, but unlike
- write-once optical disks, there is little protection from erasure due
- to failures of the storage server or RAID array firmware. This issue
- is discussed in Section 7.
- <p>
- Using magnetic disks for Venti has the benefit of reducing the
- disparity in performance between conventional and archival storage.
- Operations that previously required data to be restored to magnetic
- disk can be accomplished directly from the archive. Similarly, the
- archive can contain the primary copy of often-accessed read-only data.
- In effect, archival data need not be further down the storage
- hierarchy; it is differentiated by the write-once policy of the
- server.
- <p>
- <h1>4. Applications</h1>
- <p>
- Venti is a building block on which to construct a variety of storage
- applications. Venti provides a large repository for data that can be
- shared by many clients, much as tape libraries are currently the
- foundation of many centralized backup systems. Applications need to
- accommodate the unique properties of Venti, which are different from
- traditional block level storage devices, but these properties enable a
- number of interesting features.
- <p>
- Applications use the block level service provided by Venti to store
- more complex data structures. Data is divided into blocks and written
- to the server. To enable this data to be retrieved, the application
- must record the fingerprints of these blocks. One approach is to pack
- the fingerprints into additional blocks, called pointer blocks, that
- are also written to the server, a process that can be repeated
- recursively until a single fingerprint is obtained. This fingerprint
- represents the root of a tree of blocks and corresponds to a
- hierarchical hash of the original data.
- <p>
- A simple data structure for storing a linear sequence of data blocks
- is shown in Figure 1. The data blocks are located via a fixed depth
- tree of pointer blocks which itself is addressed by a root
- fingerprint. Applications can use such a structure to store a single
- file or to mimic the behavior of a physical device such as a tape or a
- disk drive. The write-once nature of Venti does not allow such a tree
- to be modified, but new versions of the tree can be generated
- efficiently by storing the new or modified data blocks and reusing the
- unchanged sections of the tree as depicted in Figure 2.
- <p>
-
- <img src="SimpleTree.gif" ALT="simple tree">
- <p>
- Figure 1. A tree structure for storing a linear sequence of blocks
- <p>
-
- <img src="ModifiedTree.gif" ALT="modified tree">
- <p>
- Figure 2. Build a new version of the tree.
- <p>
- By mixing data and fingerprints in a block, more complex data
- structures can be constructed. For example, a structure for storing a
- file system may include three types of blocks: directory, pointer, and
- data. A directory block combines the meta information for a file and
- the fingerprint to a tree of data blocks containing the file's
- contents. The depth of the tree can be determined from the size of
- the file, assuming the pointer and data blocks have a fixed size.
- Other structures are obviously possible. Venti's block-level
- interface leaves the choice of format to client applications and
- different data structures can coexist on a single server.
- <p>
- The following sections describes three applications that use Venti as
- an archival data repository: a user level archive utility called vac,
- a proposal for a physical level backup utility, and our preliminary
- work on a new version of the Plan 9 file system.
- <p>
- <h2>4.1. Vac</h2>
- <p>
- Vac is an application for storing a collection of files and
- directories as a single object, similar in functionality to the
- utilities tar and zip. With vac, the contents of the selected files
- are stored as a tree of blocks on a Venti server. The root
- fingerprint for this tree is written to a vac archive file specified
- by the user, which consists of an ASCII representation of the 20 byte
- root fingerprint plus a fixed header string, and is always 45 bytes
- long. A corresponding program, called unvac, enables the user to
- restore files from a vac archive. Naturally, unvac requires access to
- the Venti server that contains the actual data, but in most situations
- this is transparent. For a user, it appears that vac compresses any
- amount of data down to 45 bytes.
- <p>
- An important attribute of vac is that it writes each file as a
- separate collection of Venti blocks, thus ensuring that duplicate
- copies of a file will be coalesced on the server. If multiple users
- vac the same data, only one copy will be stored on the server.
- Similarly, a user may repeatedly vac a directory over time and even if
- the contents of the directory change, the additional storage consumed
- on the server will be related to the extent of the changes rather than
- the total size of the contents. Since Venti coalesces data at the
- block level, even files that change may share many blocks with
- previous versions and thus require little space on the server; log and
- database files are good examples of this scenario.
- <p>
- On many Unix systems, the dump utility is used to back up file
- systems. Dump has the ability to perform incremental backups of data;
- a user specifies a dump level, and only files that are new or have
- changed since the last dump at this level are written to the archive.
- To implement incremental backups, dump examines the modified time
- associated with each file, which is an efficient method of filtering
- out the unchanged files.
- <p>
- Vac also implements an incremental option based on the file
- modification times. The user specifies an existing vac file and this
- archive is used to reduce the number of blocks written to the Venti
- server. For each file, vac examines the modified time in both the
- file system and the vac archive. If they are the same, vac copies the
- fingerprint for the file from the old archive into the new archive.
- Copying just the 20-byte fingerprint enables the new archive to
- include the entire file without reading the data from the file system
- nor writing the data across the network to the Venti server. In
- addition, unlike an incremental dump, the resulting archive will be
- identical to an archive generated without the incremental option; it
- is only a performance improvement. This means there is no need to
- have multiple levels of backups, some incremental, some full, and so
- restore operations are greatly simplified.
- <p>
- A variant of the incremental option improves the backup of files
- without reference to modification times. As vac reads a file, it
- computes the fingerprint for each block. Concurrently, the pointer
- blocks of the old archive are examined to determine the fingerprint
- for the block at the same offset in the old version of the file. If
- the fingerprints are the same, the block does not need to be written
- to Venti. Instead, the fingerprint can simply be copied into the
- appropriate pointer block. This optimization reduces the number of
- writes to the Venti server, saving both network and disk bandwidth.
- Like the file level optimization above, the resulting vac file is no
- different from the one produced without this optimization. It does,
- however, require the data for the file to be read and is only
- effective if there are a significant number of unchanged blocks.
- <p>
- <h2>4.2. Physical backup</h2>
- <p>
- Utilities such as vac, tar, and dump archive data at the file or
- logical level: they walk the file hierarchy converting both data and
- meta-data into their own internal format. An alternative approach is
- block-level or physical backup, in which the disk blocks that make up
- the file system are directly copied without interpretation. Physical
- backup has a number of benefits including simplicity and potentially
- much higher throughput [8]. A physical backup utility for file
- systems that stores the resulting data on Venti appears attractive,
- though we have not yet implemented such an application.
- <p>
- The simplest form of physical backup is to copy the raw contents of
- one or mores disk drives to Venti. The backup also includes a tree of
- pointer blocks, which enables access to the data blocks. Like vac,
- the end result is a single fingerprint representing the root of the
- tree; that fingerprint needs to be recorded outside of Venti.
- <p>
- Coalescing duplicate blocks is the main advantage of making a physical
- backup to Venti rather than copying the data to another storage medium
- such as tape. Since file systems are inherently block based, we
- expect coalescing to be effective. Not only will backups of a file
- system over time share many unchanged blocks, but even file systems
- for different machines that are running the same operating system may
- have many blocks in common. As with vac, the user sees a full backup
- of the device, while retaining the storage space advantages of an
- incremental backup.
- <p>
- One enhancement to physical backup is to copy only blocks that are
- actively in use in the file system. For most file system formats it
- is relatively easy to determine if a block is in use or free without
- walking the file system hierarchy. Free blocks generally contain the
- remnants of temporary files that were created and removed in the time
- between backups and it is advantageous not to store such blocks. This
- optimization requires that the backup format be able to represent
- missing blocks, which can easily be achieved on Venti by storing a
- null value for the appropriate entry in the pointer tree.
- <p>
- The random access performance of Venti is sufficiently good that it is
- possible to use a physical backup without first restoring it to disk.
- With operating system support, it is feasible to directly mount a
- backup file system image from Venti. Access to this file system is
- read only, but it provides a natural method of restoring a subset of
- files. For situations where a full restore is required, it might be
- possible to do this restore in a lazy fashion, copying blocks from
- Venti to the file system as needed, instead of copying the entire
- contents of the file system before resuming normal operation.
- <p>
- The time to perform a physical backup can be reduced using a variety
- of incremental techniques. Like vac, the backup utility can compute
- the fingerprint of each block and compare this fingerprint with the
- appropriate entry in the pointer tree of a previous backup. This
- optimization reduces the number of writes to the Venti server. If the
- file system provides information about which blocks have changed, as
- is the case with WAFL, the backup utility can avoid even reading the
- unchanged blocks. Again, a major advantage of using Venti is that the
- backup utility can implement these incremental techniques while still
- providing the user with a full backup. The backup utility writes the
- new blocks to the Venti server and constructs a pointer tree with the
- appropriate fingerprint for the unchanged blocks.
- <p>
- <h2>4.3. Plan 9 File system</h2>
- <p>
- When combined with a small amount of read/write storage, Venti can be
- used as the primary location for data rather than a place to store
- backups. A new version of the Plan 9 file system, which we are
- developing, exemplifies this approach.
- <p>
- Previously, the Plan 9 file system was stored on a combination of
- magnetic disks and a write-once optical jukebox. The jukebox
- furnishes the permanent storage for the system, while the magnetic
- disks act as a cache for the jukebox. The cache provides faster file
- access and, more importantly, accumulates the changes to the file
- system during the period between snapshots. When a snapshot is taken,
- new or modified blocks are written from the disk cache to the jukebox.
- <p>
- The disk cache can be smaller than the active file system, needing
- only to be big enough to contain the daily changes to the file system.
- However, accesses that miss the cache are significantly slower since
- changing platters in the jukebox takes several seconds. This
- performance penalty makes certain operations on old snapshots
- prohibitively expensive. Also, on the rare occasions when the disk
- cache has been reinitialized due to corruption, the file server spends
- several days filling the cache before performance returns to normal.
- <p>
- The new version of the Plan 9 file system uses Venti instead of an
- optical jukebox as its storage device. Since the performance of Venti
- is comparable to disk, this substitution equalizes access both to the
- active and to the archival view of the file system. It also allows
- the disk cache to be quite small; the cache accumulates changes to the
- file system between snapshots, but does not speed file access.
- <p>
- <h1>5. Implementation</h1>
- <p>
- We have implemented a prototype of Venti. The implementation uses an
- append-only log of data blocks and an index that maps fingerprints to
- locations in this log. It also includes a number of features that
- improve robustness and performance. This section gives a brief
- overview of the implementation. Figure 3 shows a block diagram of the
- server.
- <p>
-
- <img src="Block.gif" ALT="block diagram">
- <p>
- Figure 3. A block diagram of the Venti prototype.
- <p>
- Since Venti is intended for archival storage, one goal of our
- prototype is robustness. The approach we have taken is to separate
- the storage of data blocks from the index used to locate a block. In
- particular, blocks are stored in an append-only log on a RAID array of
- disk drives. The simplicity of the append-only log structure
- eliminates many possible software errors that might cause data
- corruption and facilitates a variety of additional integrity
- strategies. A separate index structure allows a block to be
- efficiently located in the log; however, the index can be regenerated
- from the data log if required and thus does not have the same
- reliability constraints as the log itself.
- <p>
- The structure of the data log is illustrated in Figure 4. To ease
- maintenance, the log is divided into self-contained sections called
- arenas. Each arena contains a large number of data blocks and is
- sized to facilitate operations such as copying to removable media.
- Within an arena is a section for data bocks that is filled in an
- append-only manner. In Venti, data blocks are variable sized, up to a
- current limit of 52 Kbytes, but since blocks are immutable they can be
- densely packed into an arena without fragmentation.
- <p>
-
- <img src="LogFormat.gif" ALT="log format">
- <p>
- Figure 4. The format of the data log.
- <p>
- Each block is prefixed by a header that describes the contents of the
- block. The primary purpose of the header is to provide integrity
- checking during normal operation and to assist in data recovery. The
- header includes a magic number, the fingerprint and size of the block,
- the time when the block was first written, and identity of the user
- that wrote it. The header also includes a user-supplied type
- identifier, which is explained in Section 7. Note, only one copy of a
- given block is stored in the log, thus the user and wtime fields
- correspond to the first time the block was stored to the server.
- <p>
- Before storing a block in the log, an attempt is made to compress its
- contents. The inclusion of data compression increases the effective
- capacity of the archive and is simple to add given the log structure.
- Obviously, some blocks are incompressible. The encoding field in the
- block header indicates whether the data was compressed and, if so, the
- algorithm used. The esize field indicates the size of the data after
- compression, enabling the location of the next block in the arena to
- be determined. The downside of using compression is the computational
- cost, typically resulting in a decrease in the rate that blocks can be
- stored and retrieved. Our prototype uses a custom Lempel-Ziv '77 [21]
- algorithm that is optimized for speed. Compression is not a
- performance bottleneck for our existing server. Future
- implementations may benefit from hardware solutions.
- <p>
- In addition to a log of data blocks, an arena includes a header, a
- directory, and a trailer. The header identifies the arena. The
- directory contains a copy of the block header and offset for every
- block in the arena. By replicating the headers of all the blocks in
- one relatively small part of the arena, the server can rapidly check
- or rebuild the system's global block index. The directory also
- facilitates error recovery if part of the arena is destroyed or
- corrupted. The trailer summarizes the current state of the arena
- itself, including the number of blocks and the size of the log.
- Within the arena, the data log and the directory start at opposite
- ends and grow towards each other. When the arena is filled, it is
- marked as sealed, and a fingerprint is computed for the contents of
- the entire arena. Sealed arenas are never modified.
- <p>
- The basic operation of Venti is to store and retrieve blocks based on
- their fingerprints. A fingerprint is 160 bits long, and the number of
- possible fingerprints far exceeds the number of blocks stored on a
- server. The disparity between the number of fingerprints and blocks
- means it is impractical to map the fingerprint directly to a location
- on a storage device. Instead, we use an index to locate a block
- within the log.
- <p>
- We implement the index using a disk-resident hash table as illustrated
- in Figure 5. The index is divided into fixed-sized buckets, each of
- which is stored as a single disk block. Each bucket contains the
- index map for a small section of the fingerprint space. A hash
- function is used to map fingerprints to index buckets in a roughly
- uniform manner, and then the bucket is examined using binary search.
- If provisioned with sufficient buckets, the index hash table will be
- relatively empty and bucket overflows will be extremely rare. If a
- bucket does overflow, the extra entries are placed in an adjacent
- bucket. This structure is simple and efficient, requiring one disk
- access to locate a block in almost all cases.
- <p>
- <p>
-
- <img src="Index.gif" ALT="index format">
- <p>
- Figure 5. Format of the index.
- <p>
- The need to go through an index is the main performance penalty for
- Venti compared to a conventional block storage device. Our prototype
- uses three techniques to increase the performance: caching, striping,
- and write buffering.
- <p>
- The current implementation has two important caches of approximately
- equal size: a block cache and an index cache. A hit in the block
- cache returns the data for that fingerprint, bypassing the both the
- index lookup and access to the data log. Hits in the index cache
- eliminate only the index lookup, but the entries are much smaller and
- the hit rate correspondingly higher.
- <p>
- Unfortunately, these caches do not speed the process of storing a new
- block to Venti. The server must check that the block is not a
- duplicate by examining the index. If the block is not contained on
- the server, it will obviously not be in any cache. Since the
- fingerprint of the block contains no internal structure, the location
- of a fingerprint in the index is essentially random. Furthermore, the
- archival nature of Venti means the entire index will not fit in memory
- because of the large number of blocks. Combining these factors means
- that the write performance of Venti will be limited to the random IO
- performance of the index disk, which for current technology is a few
- hundred accesses per second. By striping the index across multiple
- disks, however, we get a linear speedup. This requires a sufficient
- number of concurrent accesses, which we assure by buffering the writes
- before accessing the index.
- <p>
- The prototype Venti server is implemented for the Plan 9 operating
- system in about 10,000 lines of C. The server runs on a dedicated dual
- 550Mhz Pentium III processor system with 2 Gbyte of memory and is
- accessed over a 100Mbs Ethernet network. The data log is stored on a
- 500 Gbyte MaxTronic IDE Raid 5 Array and the index resides on a string
- of 8 Seagate Cheetah 18XL 9 Gbyte SCSI drives.
- <p>
- <h1>6. Performance</h2>
- <p>
- Table 1 gives the preliminary performance results for read and write
- operations in a variety of situations. For comparison, we include the
- SCSI performance of the RAID array. Although the performance is still
- several times slower than directly accessing the disk, we believe the
- results are promising and will improve as the system matures.
- <p>
- Table 1. The performance of read and write operations in Mbytes/s for 8 Kbyte blocks.
- <p>
- <p>
- <table align=center>
- <tr>
- <th></th>
- <th width=150>sequential reads</th>
- <th width=150>random reads</th>
- <th width=150>virgin writes</th>
- <th width=150>duplicate writes</th>
- </tr>
- <tr>
- <td>uncached</td>
- <td align=center>0.9</td>
- <td align=center>0.4</td>
- <td align=center>3.7</td>
- <td align=center>5.6</td>
- </tr>
- <tr>
- <td>index cache</td>
- <td align=center>4.2</td>
- <td align=center>0.7</td>
- <td align=center>-</td>
- <td align=center>6.2</td>
- </tr>
- <tr>
- <td>block cache</td>
- <td align=center>6.8</td>
- <td align=center>-</td>
- <td align=center>-</td>
- <td align=center>6.5</td>
- </tr>
- <tr>
- <td>raw raid</td>
- <td align=center>14.8</td>
- <td align=center>1.0</td>
- <td align=center>12.4</td>
- <td align=center>12.4</td>
- </tr>
- </table>
- <p>
- The uncached sequential read performance is particularly bad. The
- problem is that these sequential reads require a random read of the
- index. Without assistance from the client, the read operations are
- not overlapped and do not benefit from the striping of the index. One
- possible solution is a form of read-ahead. When reading a block from
- the data log, it is feasible to also read several following blocks.
- These extra blocks can be added to the caches without referencing the
- index. If blocks are read in the same order they were written to the
- log, the latency of uncached index lookups will be avoided. This
- strategy should work well for streaming data such as multimedia files.
- <p>
- The basic assumption in Venti is that the growth in capacity of disks
- combined with the removal of duplicate blocks and compression of their
- contents enables a model in which it is not necessary to reclaim space
- by deleting archival data. To demonstrate why we believe this model
- is practical, we present some statistics derived from a decade's use
- of the Plan 9 file system.
- <p>
- The computing environment in which we work includes two Plan 9 file
- servers named bootes and emelie. Bootes was our primary file
- repository from 1990 until 1997 at which point it was superseded by
- emelie. Over the life of these two file servers there have been 522
- user accounts of which between 50 and 100 were active at any given
- time. The file servers have hosted numerous development projects and
- also contain several large data sets including chess end games,
- astronomical data, satellite imagery, and multimedia files.
- <p>
- Figure 6 depicts the size of the active file system as measured over
- time by du, the space consumed on the jukebox, and the size of the
- jukebox's data if it were to be stored on Venti. The ratio of the
- size of the archival data and the active file system is also given.
- As can be seen, even without using Venti, the storage required to
- implement the daily snapshots in Plan 9 is relatively modest, a result
- of the block level incremental approach to generating a snapshot.
- When the archival data is stored to Venti the cost of retaining the
- snapshots is reduced significantly. In the case of the emelie file
- system, the size on Venti is only slightly larger than the active file
- system; the cost of retaining the daily snapshots is almost zero.
- Note that the amount of storage that Venti uses for the snapshots
- would be the same even if more conventional methods were used to back
- up the file system. The Plan 9 approach to snapshots is not a
- necessity, since Venti will remove duplicate blocks.
- <p>
- <img src="bootes.gif" ALT="storage sizes for bootes">
- <img src="emelie.gif" ALT="storage sizes for emelie">
- <img src="bootes2.gif" ALT="ratio of sizes for bootes">
- <img src="emelie2.gif" ALT="ratio of sizes for emelie">
- <p>
- Figure 6. Graphs of the various sizes of two Plan 9 file servers.
- <p>
- When stored on Venti, the size of the jukebox data is reduced by three
- factors: elimination of duplicate blocks, elimination of block
- fragmentation, and compression of the block contents. Table 2
- presents the percent reduction for each of these factors. Note,
- bootes uses a 6 Kbyte block size while emelie uses 16 Kbyte, so the
- effect of removing fragmentation is more significant on emelie.
- <p>
- The 10 year history of the two Plan 9 file servers may be of interest
- to other researchers. We have made available per-block information
- including a hash of each block's contents, all the block pointers, and
- most of the directory information. The traces do not include the
- actual contents of files nor the file names. There is sufficient
- information to reconstruct the structure of the file system and to
- track the daily changes to this structure over time. The traces are
- available at http://www.cs.bell-labs.com/~seanq/p9trace.html.
- <p>
- Table 2. The percentage reduction in the size of data stored on
- Venti.
- <p>
- <table align=center>
- <tr>
- <th></th>
- <th width=150>bootes</th>
- <th width=150>emelie</th>
- </tr>
- <tr>
- <td>Elimination of duplicates</td>
- <td align=center>27.8%</td>
- <td align=center>31.3%</td>
- </tr>
- <tr>
- <td>Elimination of fragments</td>
- <td align=center>10.2%</td>
- <td align=center>25.4%</td>
- </tr>
- <tr>
- <td>Data Compression</td>
- <td align=center>33.8%</td>
- <td align=center>54.1%</td>
- </tr>
- <tr>
- <td>Total Reduction</td>
- <td align=center>59.7%</td>
- <td align=center>76.5%</td>
- </tr>
- </table>
- <p>
- <p>
- <h1>7. Reliability and Recovery</h1>
- <p>
- In concert with the development of the Venti prototype, we have built
- a collection of tools for integrity checking and error recovery.
- Example uses of these tools include: verifying the structure of an
- arena, checking there is an index entry for every block in the data
- log and vice versa, rebuilding the index from the data log, and
- copying an arena to removable media. These tools directly access the
- storage devices containing the data log and index and are executed on
- the server.
- <p>
- The directory structure at the end of each area enhances the
- efficiency of many integrity and recovery operations, since it is
- typically two orders of magnitude smaller than the arena, yet contains
- most of the needed information. The index checking utility, for
- example, is implemented as a disk based sort of all the arena
- directories, followed by a comparison between this sorted list and the
- index. Our prototype currently contains approximately 150 million
- blocks using 250 Gbytes of storage. An index check takes 2.2 hours,
- which is significantly less than the 6 hours it takes to read all the
- log data.
- <p>
- An additional integrity and recovery feature is the association of a
- type identifier with every block. This 8 bit identifier is included
- with all client read and write operations and has the effect of
- partitioning the server into multiple independent domains. The idea
- is that type indicates the interpretation of the data contained in the
- block. A client can use this feature, for example, to indicate that a
- block is the root node for a tree of blocks. Currently, the data
- format associated with a type is left entirely to the client; the
- server does not interpret the type other that to use it in conjunction
- with a fingerprint as the key with which to index a block.
- <p>
- One use of the type identifier is to assist the administrator in
- locating blocks for which a user has accidentally lost the
- fingerprint. Using a tool on the server, the data log can be scanned
- for blocks that match specified criteria, including the block type,
- the write time, and user identifier. The type makes it relatively
- simple to locate forgotten root blocks. Future uses for the type
- might include the ability for the server to determine the location of
- fingerprints within a block, enabling the server to traverse the data
- structures that have been stored.
- <p>
- By storing the data log on a RAID 5 disk array, our server is
- protected against single drive failures. Obviously, there are many
- scenarios where this is not sufficient: multiple drives may fail,
- there may be a fire in the machine room, the RAID firmware may contain
- bugs, or the device may be stolen.
- <p>
- Additional protection could be obtained by using one or more off-site
- mirrors for the server. We have not yet implemented this strategy,
- but the architecture of Venti makes this relatively simple. A
- background process on the server copies new blocks from the data log
- to the mirrors. This copying can be achieved using the Venti
- protocol; the server is simply another client to the mirror.
- <p>
- Even mirroring may not be sufficient. The implementation of Venti may
- contain bugs that can be exploited to compromise the server. An
- automated attack may delete data on many servers simultaneously.
- Storage devices that provide low level enforcement of a write-once
- policy would provide protection for such an attack. Write-once
- read-many optical jukeboxes often provide such protection, but this is
- not yet common for magnetic disk based storage systems. We have thus
- resorted to copying the sealed arenas onto removable media.
- <p>
- <h1>8. Related Work</h1>
- <p>
- The Stanford Archival Vault [2] is a prototype archival repository
- intended for digital libraries. The archive consists of a write-once
- log of digital objects (files) and several auxiliary indexes for
- locating objects within the log. Objects are identified by the hash
- of their contents using a cyclic redundancy check (CRC). Unlike
- Venti, this system has no way to share data between objects that are
- partially the same, or to build up complex data structures such as a
- file system hierarchy. Rather, the archive consists of a collection
- of separate objects with a limited ability to group objects into sets.
- <p>
- On Venti, blocks are organized into more complex data structures by
- creating hash-trees, an idea originally proposed by Merkle [11] for an
- efficient digital signature scheme.
- <p>
- The approach to block retrieval in the Read-Only Secure File System
- (SFSRO) [3] is comparable to Venti. Blocks are identified by the Sha1
- hash of their contents and this idea is applied recursively to build
- up more complex structures. The focus of this system is security, not
- archival storage. An administrator creates a digitally signed
- database offline. The database contains a public read-only file
- system that can be published on multiple servers and efficiently and
- securely accessed by clients. SFSRO outperforms traditional methods
- for providing data integrity between a client and a file server,
- demonstrating an attractive property of hash-based addressing.
- <p>
- Given their similarities, it would be simple to implement SFSRO on top
- of Venti. The goal of Venti is to provide a flexible location for
- archival storage and SFSRO is a good example of an application that
- could use this capability. In fact, using Venti would provide a
- trivial solution to SFSRO's problem with stale NFS handles since data
- is never deleted from Venti and thus a stale handle will never be
- encountered.
- <p>
- Content-Derived Names [6] are another example of naming objects based
- on a secure hash of its contents. This work addresses the issue of
- naming and managing the various binary software components, in
- particular shared libraries, that make up an application.
- <p>
- The philosophy of the Elephant file system [18] is similar to Venti;
- large, cheap disks make it feasible to retain many versions of data.
- A feature of the Elephant system is the ability to specify a variety
- of data retention policies, which can be applied to individual files
- or directories. These policies attempt to strike a balance between
- the costs and benefits of storing every version of a file. In
- contrast, Venti focuses on the problem of how to store information
- after deciding that it should be retained in perpetuity. A system
- such as the Elephant file system could incorporate Venti as the
- storage device for the permanent "landmark" versions of files, much as
- the Plan 9 file system will use Venti to archive snapshots.
- <p>
- Self-Securing Storage [19] retains all versions of file system data in
- order to provide diagnosis and recovery from security breaches. The
- system is implemented as a self-contained network service that exports
- an object-based disk interface, providing protection from compromise
- of the client operating system. Old data is retained for a window of
- time and then deleted to reclaim storage.
- <p>
- Venti provides many of the features of self-securing storage: the
- server is self-contained and accessed through a simple low-level
- protocol, malicious users cannot corrupt or delete existing data on
- the server, and old versions of data are available for inspection. It
- is unlikely that a system would write every file system operation to
- Venti since storage is never reclaimed, but not deleting data removes
- the constraint that an intrusion must be detected within a limited
- window of time. A hybrid approach might retain every version for some
- time and some versions for all time. Venti could provide the
- long-term storage for such a hybrid.
- <p>
- <h1>9. Future Work</h1>
- <p>
- Venti could be distributed across multiple machines; the approach of
- identifying data by a hash of its contents simplifies such an
- extension. For example, the IO performance could be improved by
- replicating the server and using a simple load balancing algorithm.
- When storing or retrieving a block, clients direct the operation to a
- server based on a few bits of the fingerprint. Such load balancing
- could even be hidden from the client application by interposing a
- proxy server that performs this operation on behalf of the client.
- <p>
- Today, Venti provides little security. After authenticating to the
- server, clients can read any block for which they know the
- fingerprint. A fingerprint does act as a capability since the space
- of fingerprints is large and the Venti protocol does not include a
- means of enumerating the blocks on the server. However, this
- protection is weak as a single root fingerprint enables access to an
- entire file tree and once a fingerprint is known, there is no way to
- restrict access to a particular user. We are exploring ways of
- providing better access control.
- <p>
- To date, the structures we have used for storing data on Venti break
- files into a series of fixed sized blocks. Identical blocks are
- consolidated on Venti, but this consolidation will not occur if the
- data is shifted within the file or an application uses a different
- block size. This limitation can be overcome using an adaptation of
- Manber's algorithm for finding similarities in files [9]. The idea is
- to break files into variable sized blocks based on the identification
- of anchor or break points, increasing the occurrence of duplicate
- blocks [12]. Such a strategy can be implemented in client
- applications with no change to the Venti server.
- <p>
- A more detailed analysis of the decade of daily snapshots of the Plan
- 9 file systems might be interesting. The trace data we have made
- publicly available contains approximately the same information used
- for other studies of long term file activity [4].
- <p>
- <h1>10. Conclusion</h1>
- <p>
- The approach of identifying a block by the Sha1 hash of its contents
- is well suited to archival storage. The write-once model and the
- ability to coalesce duplicate copies of a block makes Venti a useful
- building block for a number of interesting storage applications.
- <p>
- The large capacity of magnetic disks allows archival data to be
- retained and available on-line with performance that is comparable to
- conventional disks. Stored on our prototype server is over a decade
- of daily snapshots of two major departmental file servers. These
- snapshots are stored in a little over 200 Gbytes of disk space.
- Today, 100 Gbytes drives cost less than $300 and IDE RAID controllers
- are included on many motherboards. A scaled down version of our
- server could provide archival storage for a home user at an attractive
- price. Tomorrow, when terabyte disks can be had for the same price,
- it seems unlikely that archival data will be deleted to reclaim space.
- Venti provides an attractive approach to storing that data.
- <p>
- <h1>11. Acknowledgments</h1>
- <p>
- This paper was improved by comments and suggestions from Peter Bosch,
- Eric Grosse, Lorenz Huelsbergen, Rob Pike, Ross Quinlan, and Cliff
- Young and six anonymous reviewers. The paper's shepherd was Ethan L.
- Miller. We thank them all for their help.
- <p>
- <h1>12. References</h1>
- <p>
- [1] Ann Chervenak, Vivekenand Vellanki, and Zachary Kurmas.
- Protecting file systems: A survey of backup techniques. In
- Proceedings Joint NASA and IEEE Mass Storage Conference, March 1998.
- <p>
- [2] Arturo Crespo and Hector Garcia-Molina. Archival storage for
- digital libraries. In Proceedings of the 3rd ACM International
- Conference on Digital Libraries, 1998.
- <p>
- [3] Kevin Fu, Frans Kaashoek, and David Mazières. Fast and secure
- distributed read-only file system. In Proceedings of the 4th
- Symposium on Operating Systems Design and Implementation, 2000.
- <p>
- [4] Timothy J. Gibson, Ethan L. Miller, and Darrell D. E. Long.
- Long-term file activity and inter-reference patterns. In Proceedings,
- 24th International Conference on Technology Management and Performance
- Evaluation of Enterprise-Wide Information Systems, Computer
- Measurement Group, December 1998.
- <p>
- [5] Dave Hitz, James Lau, and Michael Malcolm, File system design for
- an NFS file server appliance, In Proceedings of the Winter 1994 USENIX
- Conference, San Francisco, CA, January 1994.
- <p>
- [6] J. K. Hollingsworth and E. L. Miller. Using content-derived names
- for configuration management. In Proceeding of the 1997 ACM Symposium
- on Software Reusability, Boston, May 1997.
- <p>
- [7] John Howard, Michael Kazar, Sherri Menees, David Nichols, Mahadev
- Satyanarayanan, Robert Sidebotham, and Michael West. Scale and
- performance in a distributed file system. ACM Transactions on
- Computer Systems, 6(1):51-81, February 1988.
- <p>
- [8] Norman C. Hutchinson, Stephen Manley, Mike Federwisch, Guy Harris,
- Dave Hitz, Steven Kleiman, and Sean O'Malley. Logical vs. physical
- file system backup. In Proceedings of the 3rd USENIX Symposium on
- Operating Systems Design and Implementation (OSDI), 1999.
- <p>
- [9] Udi Manber. Finding similar files in a large file system. In
- Proceedings of the Winter 1994 USENIX Conference, San Francisco, CA,
- January 1994.
- <p>
- [10] Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone.
- Handbook of Applied Cryptography. CRC Press, 1996.
- <p>
- [11] Ralph C. Merkle. Protocols for public-key cryptosystems. In
- Proceedings of the IEEE Symposium on Security and Privacy, pp.
- 122-133, April 1980.
- <p>
- [12] Athicha Muthitacharoen, Benjie Chen, and David Mazières. A
- low-bandwidth network file system. In Proceedings of the 18th
- Symposium on Operating Systems Principles, October 2001.
- <p>
- [13] National Institute of Standards and Technology, FIPS 180-1.
- Secure Hash Standard. US Department of Commerce, April 1995.
- <p>
- [14] National Institute of Standards and Technology, Draft FIPS 180-2.
- Secure Hash Standard. US Department of Commerce, May 2001.
- <p>
- [15] Evi Nemeth, Garth Snyder, Scott Seebass, and Trent R. Hein. UNIX
- System Administration Handbook 3rd Edition, Prentice Hall, 2001.
- <p>
- [16] Rob Pike, Dave Presotto, Sean Dorward, Bob Flandrena, Ken
- Thompson, Howard Trickey, and Phil Winterbottom. Plan 9 from Bell
- Labs, Computing Systems, Vol. 8, 3, pp. 221-254, Summer 1995.
- <p>
- [17] Sean Quinlan. A cache worm file system. Software-Practice and
- Experience, Vol 21, 12, pp 1289-1299, December 1991.
- <p>
- [18] Douglas S. Santry, Michael J. Feeley, Norman C. Hutchinson,
- Alistair C. Veitch, Ross W. Carton and Jacob Ofir. Deciding when to
- forget in the Elephant file system. In Proceedings of the 17th
- Symposium on Operating Systems Principles, December 12-15, 1999.
- <p>
- [19] John. D. Strunk, Garth R. Goodson, Michael L. Scheinholtz, Craig
- A.N. Soules, and Gregory R. Ganger. Self-securing storage: protecting
- data in compromised systems. In Proceedings of the 4th Symposium on
- Operating Systems Design and Implementation, October 2000.
- <p>
- [20] D. A. Thompson and J. S. Best. The future of magnetic data
- storage technology, IBM Journal of Research and Development, Vol 44,
- 3, pp. 311-322, May 2000.
- <p>
- [21] J. Ziv and A. Lempel. A universal algorithm for sequential data
- compression, IEEE Trans. Inform. Theory, vol. IT-23, pp. 337-343,
- May 1977.
- <p>
|