venti 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451
  1. .TH VENTI 6
  2. .SH NAME
  3. venti \- archival storage server
  4. .SH DESCRIPTION
  5. Venti is a block storage server intended for archival data.
  6. In a Venti server, the SHA1 hash of a block's contents acts
  7. as the block identifier for read and write operations.
  8. This approach enforces a write-once policy, preventing
  9. accidental or malicious destruction of data. In addition,
  10. duplicate copies of a block are coalesced, reducing the
  11. consumption of storage and simplifying the implementation
  12. of clients.
  13. .PP
  14. This manual page documents the basic concepts of
  15. block storage using Venti as well as the Venti network protocol.
  16. .PP
  17. .IR Venti (1)
  18. documents some simple clients.
  19. .IR Vac (1)
  20. and
  21. .IR vacfs (4)
  22. are more complex clients.
  23. .PP
  24. .IR Venti (2)
  25. describes a C library interface for accessing
  26. Venti servers and manipulating Venti data structures.
  27. .PP
  28. .IR Venti (8)
  29. describes the programs used to run a Venti server.
  30. .PP
  31. .SS "Scores
  32. The SHA1 hash that identifies a block is called its
  33. .IR score .
  34. The score of the zero-length block is called the
  35. .IR "zero score" .
  36. .PP
  37. Scores may have an optional
  38. .IB label :
  39. prefix, typically used to
  40. describe the format of the data.
  41. For example,
  42. .IR vac (1)
  43. uses a
  44. .B vac:
  45. prefix, while
  46. .IR vbackup (8)
  47. uses prefixes corresponding to the file system
  48. types:
  49. .BR ext2: ,
  50. .BR ffs: ,
  51. and so on.
  52. .SS "Files and Directories
  53. Venti accepts blocks up to 56 kilobytes in size.
  54. By convention, Venti clients use hash trees of blocks to
  55. represent arbitrary-size data
  56. .IR files .
  57. The data to be stored is split into fixed-size
  58. blocks and written to the server, producing a list
  59. of scores.
  60. The resulting list of scores is split into fixed-size pointer
  61. blocks (using only an integral number of scores per block)
  62. and written to the server, producing a smaller list
  63. of scores.
  64. The process continues, eventually ending with the
  65. score for the hash tree's top-most block.
  66. Each file stored this way is summarized by
  67. a
  68. .B VtEntry
  69. structure recording the top-most score, the depth
  70. of the tree, the data block size, and the pointer block size.
  71. One or more
  72. .B VtEntry
  73. structures can be concatenated
  74. and stored as a special file called a
  75. .IR directory .
  76. In this
  77. manner, arbitrary trees of files can be constructed
  78. and stored.
  79. .PP
  80. Scores passed between programs conventionally refer
  81. to
  82. .B VtRoot
  83. blocks, which contain descriptive information
  84. as well as the score of a directory block containing a small number
  85. of directory entries.
  86. .PP
  87. Conventionally, programs do not mix data and directory entries
  88. in the same file. Instead, they keep two separate files, one with
  89. directory entries and one with metadata referencing those
  90. entries by position.
  91. Keeping this parallel representation is a minor annoyance
  92. but makes it possible for general programs like
  93. .I venti/copy
  94. (see
  95. .IR venti (1))
  96. to traverse the block tree without knowing the specific details
  97. of any particular program's data.
  98. .SS "Block Types
  99. To allow programs to traverse these structures without
  100. needing to understand their higher-level meanings,
  101. Venti tags each block with a type. The types are:
  102. .PP
  103. .nf
  104. .ft L
  105. VtDataType 000 \f1data\fL
  106. VtDataType+1 001 \fRscores of \fPVtDataType\fR blocks\fL
  107. VtDataType+2 002 \fRscores of \fPVtDataType+1\fR blocks\fL
  108. \fR\&...\fL
  109. VtDirType 010 VtEntry\fR structures\fL
  110. VtDirType+1 011 \fRscores of \fLVtDirType\fR blocks\fL
  111. VtDirType+2 012 \fRscores of \fLVtDirType+1\fR blocks\fL
  112. \fR\&...\fL
  113. VtRootType 020 VtRoot\fR structure\fL
  114. .fi
  115. .PP
  116. The octal numbers listed are the type numbers used
  117. by the commands below.
  118. (For historical reasons, the type numbers used on
  119. disk and on the wire are different from the above.
  120. They do not distinguish
  121. .BI VtDataType+ n
  122. blocks from
  123. .BI VtDirType+ n
  124. blocks.)
  125. .SS "Zero Truncation
  126. To avoid storing the same short data blocks padded with
  127. differing numbers of zeros, Venti clients working with fixed-size
  128. blocks conventionally
  129. `zero truncate' the blocks before writing them to the server.
  130. For example, if a 1024-byte data block contains the
  131. 11-byte string
  132. .RB ` hello " " world '
  133. followed by 1013 zero bytes,
  134. a client would store only the 11-byte block.
  135. When the client later read the block from the server,
  136. it would append zero bytes to the end as necessary to
  137. reach the expected size.
  138. .PP
  139. When truncating pointer blocks
  140. .RB ( VtDataType+ \fIn
  141. and
  142. .BI VtDirType+ n
  143. blocks),
  144. trailing zero scores are removed
  145. instead of trailing zero bytes.
  146. .PP
  147. Because of the truncation convention,
  148. any file consisting entirely of zero bytes,
  149. no matter what its length, will be represented by the zero score:
  150. the data blocks contain all zeros and are thus truncated
  151. to the empty block, and the pointer blocks contain all zero scores
  152. and are thus also truncated to the empty block,
  153. and so on up the hash tree.
  154. .SS Network Protocol
  155. A Venti session begins when a
  156. .I client
  157. connects to the network address served by a Venti
  158. .IR server ;
  159. the conventional address is
  160. .BI tcp! server !venti
  161. (the
  162. .B venti
  163. port is 17034).
  164. Both client and server begin by sending a version
  165. string of the form
  166. .BI venti- versions - comment \en \fR.
  167. The
  168. .I versions
  169. field is a list of acceptable versions separated by
  170. colons.
  171. The protocol described here is version
  172. .BR 02 .
  173. The client is responsible for choosing a common
  174. version and sending it in the
  175. .B VtThello
  176. message, described below.
  177. .PP
  178. After the initial version exchange, the client transmits
  179. .I requests
  180. .RI ( T-messages )
  181. to the server, which subsequently returns
  182. .I replies
  183. .RI ( R-messages )
  184. to the client.
  185. The combined act of transmitting (receiving) a request
  186. of a particular type, and receiving (transmitting) its reply
  187. is called a
  188. .I transaction
  189. of that type.
  190. .PP
  191. Each message consists of a sequence of bytes.
  192. Two-byte fields hold unsigned integers represented
  193. in big-endian order (most significant byte first).
  194. Data items of variable lengths are represented by
  195. a one-byte field specifying a count,
  196. .IR n ,
  197. followed by
  198. .I n
  199. bytes of data.
  200. Text strings are represented similarly,
  201. using a two-byte count with
  202. the text itself stored as a UTF-encoded sequence
  203. of Unicode characters (see
  204. .IR utf (6)).
  205. Text strings are not
  206. .SM NUL\c
  207. -terminated:
  208. .I n
  209. counts the bytes of UTF data, which include no final
  210. zero byte.
  211. The
  212. .SM NUL
  213. character is illegal in text strings in the Venti protocol.
  214. The maximum string length in Venti is 1024 bytes.
  215. .PP
  216. Each Venti message begins with a two-byte size field
  217. specifying the length in bytes of the message,
  218. not including the length field itself.
  219. The next byte is the message type, one of the constants
  220. in the enumeration in the include file
  221. .BR <venti.h> .
  222. The next byte is an identifying
  223. .IR tag ,
  224. used to match responses to requests.
  225. The remaining bytes are parameters of different sizes.
  226. In the message descriptions, the number of bytes in a field
  227. is given in brackets after the field name.
  228. The notation
  229. .IR parameter [ n ]
  230. where
  231. .I n
  232. is not a constant represents a variable-length parameter:
  233. .IR n [1]
  234. followed by
  235. .I n
  236. bytes of data forming the
  237. .IR parameter .
  238. The notation
  239. .IR string [ s ]
  240. (using a literal
  241. .I s
  242. character)
  243. is shorthand for
  244. .IR s [2]
  245. followed by
  246. .I s
  247. bytes of UTF-8 text.
  248. The notation
  249. .IR parameter []
  250. where
  251. .I parameter
  252. is the last field in the message represents a
  253. variable-length field that comprises all remaining
  254. bytes in the message.
  255. .PP
  256. All Venti RPC messages are prefixed with a field
  257. .IR size [2]
  258. giving the length of the message that follows
  259. (not including the
  260. .I size
  261. field itself).
  262. The message bodies are:
  263. .ta \w'\fLVtTgoodbye 'u
  264. .IP
  265. .ne 2v
  266. .B VtThello
  267. .IR tag [1]
  268. .IR version [ s ]
  269. .IR uid [ s ]
  270. .IR strength [1]
  271. .IR crypto [ n ]
  272. .IR codec [ n ]
  273. .br
  274. .B VtRhello
  275. .IR tag [1]
  276. .IR sid [ s ]
  277. .IR rcrypto [1]
  278. .IR rcodec [1]
  279. .IP
  280. .ne 2v
  281. .B VtTping
  282. .IR tag [1]
  283. .br
  284. .B VtRping
  285. .IR tag [1]
  286. .IP
  287. .ne 2v
  288. .B VtTread
  289. .IR tag [1]
  290. .IR score [20]
  291. .IR type [1]
  292. .IR pad [1]
  293. .IR count [2]
  294. .br
  295. .B VtRead
  296. .IR tag [1]
  297. .IR data []
  298. .IP
  299. .ne 2v
  300. .B VtTwrite
  301. .IR tag [1]
  302. .IR type [1]
  303. .IR pad [3]
  304. .IR data []
  305. .br
  306. .B VtRwrite
  307. .IR tag [1]
  308. .IR score [20]
  309. .IP
  310. .ne 2v
  311. .B VtTsync
  312. .IR tag [1]
  313. .br
  314. .B VtRsync
  315. .IR tag [1]
  316. .IP
  317. .ne 2v
  318. .B VtRerror
  319. .IR tag [1]
  320. .IR error [ s ]
  321. .IP
  322. .ne 2v
  323. .B VtTgoodbye
  324. .IR tag [1]
  325. .PP
  326. Each T-message has a one-byte
  327. .I tag
  328. field, chosen and used by the client to identify the message.
  329. The server will echo the request's
  330. .I tag
  331. field in the reply.
  332. Clients should arrange that no two outstanding
  333. messages have the same tag field so that responses
  334. can be distinguished.
  335. .PP
  336. The type of an R-message will either be one greater than
  337. the type of the corresponding T-message or
  338. .BR Rerror ,
  339. indicating that the request failed.
  340. In the latter case, the
  341. .I error
  342. field contains a string describing the reason for failure.
  343. .PP
  344. Venti connections must begin with a
  345. .B hello
  346. transaction.
  347. The
  348. .B VtThello
  349. message contains the protocol
  350. .I version
  351. that the client has chosen to use.
  352. The fields
  353. .IR strength ,
  354. .IR crypto ,
  355. and
  356. .IR codec
  357. could be used to add authentication, encryption,
  358. and compression to the Venti session
  359. but are currently ignored.
  360. The
  361. .IR rcrypto ,
  362. and
  363. .I rcodec
  364. fields in the
  365. .B VtRhello
  366. response are similarly ignored.
  367. The
  368. .IR uid
  369. and
  370. .IR sid
  371. fields are intended to be the identity
  372. of the client and server but, given the lack of
  373. authentication, should be treated only as advisory.
  374. The initial
  375. .B hello
  376. should be the only
  377. .B hello
  378. transaction during the session.
  379. .PP
  380. The
  381. .B ping
  382. message has no effect and
  383. is used mainly for debugging.
  384. Servers should respond immediately to pings.
  385. .PP
  386. The
  387. .B read
  388. message requests a block with the given
  389. .I score
  390. and
  391. .IR type .
  392. Use
  393. .I vttodisktype
  394. and
  395. .I vtfromdisktype
  396. (see
  397. .IR venti (2))
  398. to convert a block type enumeration value
  399. .RB ( VtDataType ,
  400. etc.)
  401. to the
  402. .I type
  403. used on disk and in the protocol.
  404. The
  405. .I count
  406. field specifies the maximum expected size
  407. of the block.
  408. The
  409. .I data
  410. in the reply is the block's contents.
  411. .PP
  412. The
  413. .B write
  414. message writes a new block of the given
  415. .I type
  416. with contents
  417. .I data
  418. to the server.
  419. The response includes the
  420. .I score
  421. to use to read the block,
  422. which should be the SHA1 hash of
  423. .IR data .
  424. .PP
  425. The Venti server may buffer written blocks in memory,
  426. waiting until after responding to the
  427. .B write
  428. message before writing them to
  429. permanent storage.
  430. The server will delay the response to a
  431. .B sync
  432. message until after all blocks in earlier
  433. .B write
  434. messages have been written to permanent storage.
  435. .PP
  436. The
  437. .B goodbye
  438. message ends a session. There is no
  439. .BR VtRgoodbye :
  440. upon receiving the
  441. .BR VtTgoodbye
  442. message, the server terminates up the connection.
  443. .SH SEE ALSO
  444. .IR venti (1),
  445. .IR venti (2),
  446. .IR venti (8)
  447. .br
  448. Sean Quinlan and Sean Dorward,
  449. ``Venti: a new approach to archival storage'',
  450. .I "Usenix Conference on File and Storage Technologies" ,
  451. 2002.