123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445 |
- .TH VENTI 6
- .SH NAME
- venti \- archival storage server
- .SH DESCRIPTION
- Venti is a block storage server intended for archival data.
- In a Venti server, the SHA1 hash of a block's contents acts
- as the block identifier for read and write operations.
- This approach enforces a write-once policy, preventing
- accidental or malicious destruction of data. In addition,
- duplicate copies of a block are coalesced, reducing the
- consumption of storage and simplifying the implementation
- of clients.
- .PP
- This manual page documents the basic concepts of
- block storage using Venti as well as the Venti network protocol.
- .PP
- .IR Venti (1)
- documents some simple clients.
- .IR Vac (1)
- and
- .IR vacfs (4)
- are more complex clients.
- .PP
- .IR Venti (2)
- describes a C library interface for accessing
- Venti servers and manipulating Venti data structures.
- .PP
- .IR Venti (8)
- describes the programs used to run a Venti server.
- .PP
- .SS "Scores
- The SHA1 hash that identifies a block is called its
- .IR score .
- The score of the zero-length block is called the
- .IR "zero score" .
- .PP
- Scores may have an optional
- .IB label :
- prefix, typically used to
- describe the format of the data.
- For example,
- .IR vac (1)
- uses a
- .B vac:
- prefix.
- .SS "Files and Directories
- Venti accepts blocks up to 56 kilobytes in size.
- By convention, Venti clients use hash trees of blocks to
- represent arbitrary-size data
- .IR files .
- The data to be stored is split into fixed-size
- blocks and written to the server, producing a list
- of scores.
- The resulting list of scores is split into fixed-size pointer
- blocks (using only an integral number of scores per block)
- and written to the server, producing a smaller list
- of scores.
- The process continues, eventually ending with the
- score for the hash tree's top-most block.
- Each file stored this way is summarized by
- a
- .B VtEntry
- structure recording the top-most score, the depth
- of the tree, the data block size, and the pointer block size.
- One or more
- .B VtEntry
- structures can be concatenated
- and stored as a special file called a
- .IR directory .
- In this
- manner, arbitrary trees of files can be constructed
- and stored.
- .PP
- Scores passed between programs conventionally refer
- to
- .B VtRoot
- blocks, which contain descriptive information
- as well as the score of a directory block containing a small number
- of directory entries.
- .PP
- Conventionally, programs do not mix data and directory entries
- in the same file. Instead, they keep two separate files, one with
- directory entries and one with metadata referencing those
- entries by position.
- Keeping this parallel representation is a minor annoyance
- but makes it possible for general programs like
- .I venti/copy
- (see
- .IR venti (1))
- to traverse the block tree without knowing the specific details
- of any particular program's data.
- .SS "Block Types
- To allow programs to traverse these structures without
- needing to understand their higher-level meanings,
- Venti tags each block with a type. The types are:
- .PP
- .nf
- .ft L
- VtDataType 000 \f1data\fL
- VtDataType+1 001 \fRscores of \fPVtDataType\fR blocks\fL
- VtDataType+2 002 \fRscores of \fPVtDataType+1\fR blocks\fL
- \fR\&...\fL
- VtDirType 010 VtEntry\fR structures\fL
- VtDirType+1 011 \fRscores of \fLVtDirType\fR blocks\fL
- VtDirType+2 012 \fRscores of \fLVtDirType+1\fR blocks\fL
- \fR\&...\fL
- VtRootType 020 VtRoot\fR structure\fL
- .fi
- .PP
- The octal numbers listed are the type numbers used
- by the commands below.
- (For historical reasons, the type numbers used on
- disk and on the wire are different from the above.
- They do not distinguish
- .BI VtDataType+ n
- blocks from
- .BI VtDirType+ n
- blocks.)
- .SS "Zero Truncation
- To avoid storing the same short data blocks padded with
- differing numbers of zeros, Venti clients working with fixed-size
- blocks conventionally
- `zero truncate' the blocks before writing them to the server.
- For example, if a 1024-byte data block contains the
- 11-byte string
- .RB ` hello " " world '
- followed by 1013 zero bytes,
- a client would store only the 11-byte block.
- When the client later read the block from the server,
- it would append zero bytes to the end as necessary to
- reach the expected size.
- .PP
- When truncating pointer blocks
- .RB ( VtDataType+ \fIn
- and
- .BI VtDirType+ n
- blocks),
- trailing zero scores are removed
- instead of trailing zero bytes.
- .PP
- Because of the truncation convention,
- any file consisting entirely of zero bytes,
- no matter what its length, will be represented by the zero score:
- the data blocks contain all zeros and are thus truncated
- to the empty block, and the pointer blocks contain all zero scores
- and are thus also truncated to the empty block,
- and so on up the hash tree.
- .SS Network Protocol
- A Venti session begins when a
- .I client
- connects to the network address served by a Venti
- .IR server ;
- the conventional address is
- .BI tcp! server !venti
- (the
- .B venti
- port is 17034).
- Both client and server begin by sending a version
- string of the form
- .BI venti- versions - comment \en \fR.
- The
- .I versions
- field is a list of acceptable versions separated by
- colons.
- The protocol described here is version
- .BR 02 .
- The client is responsible for choosing a common
- version and sending it in the
- .B VtThello
- message, described below.
- .PP
- After the initial version exchange, the client transmits
- .I requests
- .RI ( T-messages )
- to the server, which subsequently returns
- .I replies
- .RI ( R-messages )
- to the client.
- The combined act of transmitting (receiving) a request
- of a particular type, and receiving (transmitting) its reply
- is called a
- .I transaction
- of that type.
- .PP
- Each message consists of a sequence of bytes.
- Two-byte fields hold unsigned integers represented
- in big-endian order (most significant byte first).
- Data items of variable lengths are represented by
- a one-byte field specifying a count,
- .IR n ,
- followed by
- .I n
- bytes of data.
- Text strings are represented similarly,
- using a two-byte count with
- the text itself stored as a UTF-encoded sequence
- of Unicode characters (see
- .IR utf (6)).
- Text strings are not
- .SM NUL\c
- -terminated:
- .I n
- counts the bytes of UTF data, which include no final
- zero byte.
- The
- .SM NUL
- character is illegal in text strings in the Venti protocol.
- The maximum string length in Venti is 1024 bytes.
- .PP
- Each Venti message begins with a two-byte size field
- specifying the length in bytes of the message,
- not including the length field itself.
- The next byte is the message type, one of the constants
- in the enumeration in the include file
- .BR <venti.h> .
- The next byte is an identifying
- .IR tag ,
- used to match responses to requests.
- The remaining bytes are parameters of different sizes.
- In the message descriptions, the number of bytes in a field
- is given in brackets after the field name.
- The notation
- .IR parameter [ n ]
- where
- .I n
- is not a constant represents a variable-length parameter:
- .IR n [1]
- followed by
- .I n
- bytes of data forming the
- .IR parameter .
- The notation
- .IR string [ s ]
- (using a literal
- .I s
- character)
- is shorthand for
- .IR s [2]
- followed by
- .I s
- bytes of UTF-8 text.
- The notation
- .IR parameter []
- where
- .I parameter
- is the last field in the message represents a
- variable-length field that comprises all remaining
- bytes in the message.
- .PP
- All Venti RPC messages are prefixed with a field
- .IR size [2]
- giving the length of the message that follows
- (not including the
- .I size
- field itself).
- The message bodies are:
- .ta \w'\fLVtTgoodbye 'u
- .IP
- .ne 2v
- .B VtThello
- .IR tag [1]
- .IR version [ s ]
- .IR uid [ s ]
- .IR strength [1]
- .IR crypto [ n ]
- .IR codec [ n ]
- .br
- .B VtRhello
- .IR tag [1]
- .IR sid [ s ]
- .IR rcrypto [1]
- .IR rcodec [1]
- .IP
- .ne 2v
- .B VtTping
- .IR tag [1]
- .br
- .B VtRping
- .IR tag [1]
- .IP
- .ne 2v
- .B VtTread
- .IR tag [1]
- .IR score [20]
- .IR type [1]
- .IR pad [1]
- .IR count [2]
- .br
- .B VtRread
- .IR tag [1]
- .IR data []
- .IP
- .ne 2v
- .B VtTwrite
- .IR tag [1]
- .IR type [1]
- .IR pad [3]
- .IR data []
- .br
- .B VtRwrite
- .IR tag [1]
- .IR score [20]
- .IP
- .ne 2v
- .B VtTsync
- .IR tag [1]
- .br
- .B VtRsync
- .IR tag [1]
- .IP
- .ne 2v
- .B VtRerror
- .IR tag [1]
- .IR error [ s ]
- .IP
- .ne 2v
- .B VtTgoodbye
- .IR tag [1]
- .PP
- Each T-message has a one-byte
- .I tag
- field, chosen and used by the client to identify the message.
- The server will echo the request's
- .I tag
- field in the reply.
- Clients should arrange that no two outstanding
- messages have the same tag field so that responses
- can be distinguished.
- .PP
- The type of an R-message will either be one greater than
- the type of the corresponding T-message or
- .BR Rerror ,
- indicating that the request failed.
- In the latter case, the
- .I error
- field contains a string describing the reason for failure.
- .PP
- Venti connections must begin with a
- .B hello
- transaction.
- The
- .B VtThello
- message contains the protocol
- .I version
- that the client has chosen to use.
- The fields
- .IR strength ,
- .IR crypto ,
- and
- .IR codec
- could be used to add authentication, encryption,
- and compression to the Venti session
- but are currently ignored.
- The
- .IR rcrypto ,
- and
- .I rcodec
- fields in the
- .B VtRhello
- response are similarly ignored.
- The
- .IR uid
- and
- .IR sid
- fields are intended to be the identity
- of the client and server but, given the lack of
- authentication, should be treated only as advisory.
- The initial
- .B hello
- should be the only
- .B hello
- transaction during the session.
- .PP
- The
- .B ping
- message has no effect and
- is used mainly for debugging.
- Servers should respond immediately to pings.
- .PP
- The
- .B read
- message requests a block with the given
- .I score
- and
- .IR type .
- Use
- .I vttodisktype
- and
- .I vtfromdisktype
- (see
- .IR venti (2))
- to convert a block type enumeration value
- .RB ( VtDataType ,
- etc.)
- to the
- .I type
- used on disk and in the protocol.
- The
- .I count
- field specifies the maximum expected size
- of the block.
- The
- .I data
- in the reply is the block's contents.
- .PP
- The
- .B write
- message writes a new block of the given
- .I type
- with contents
- .I data
- to the server.
- The response includes the
- .I score
- to use to read the block,
- which should be the SHA1 hash of
- .IR data .
- .PP
- The Venti server may buffer written blocks in memory,
- waiting until after responding to the
- .B write
- message before writing them to
- permanent storage.
- The server will delay the response to a
- .B sync
- message until after all blocks in earlier
- .B write
- messages have been written to permanent storage.
- .PP
- The
- .B goodbye
- message ends a session. There is no
- .BR VtRgoodbye :
- upon receiving the
- .BR VtTgoodbye
- message, the server terminates up the connection.
- .SH SEE ALSO
- .IR venti (1),
- .IR venti (2),
- .IR venti (8)
- .br
- Sean Quinlan and Sean Dorward,
- ``Venti: a new approach to archival storage'',
- .I "Usenix Conference on File and Storage Technologies" ,
- 2002.
|