venti 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445
  1. .TH VENTI 6
  2. .SH NAME
  3. venti \- archival storage server
  4. .SH DESCRIPTION
  5. Venti is a block storage server intended for archival data.
  6. In a Venti server, the SHA1 hash of a block's contents acts
  7. as the block identifier for read and write operations.
  8. This approach enforces a write-once policy, preventing
  9. accidental or malicious destruction of data. In addition,
  10. duplicate copies of a block are coalesced, reducing the
  11. consumption of storage and simplifying the implementation
  12. of clients.
  13. .PP
  14. This manual page documents the basic concepts of
  15. block storage using Venti as well as the Venti network protocol.
  16. .PP
  17. .IR Venti (1)
  18. documents some simple clients.
  19. .IR Vac (1)
  20. and
  21. .IR vacfs (4)
  22. are more complex clients.
  23. .PP
  24. .IR Venti (2)
  25. describes a C library interface for accessing
  26. Venti servers and manipulating Venti data structures.
  27. .PP
  28. .IR Venti (8)
  29. describes the programs used to run a Venti server.
  30. .PP
  31. .SS "Scores
  32. The SHA1 hash that identifies a block is called its
  33. .IR score .
  34. The score of the zero-length block is called the
  35. .IR "zero score" .
  36. .PP
  37. Scores may have an optional
  38. .IB label :
  39. prefix, typically used to
  40. describe the format of the data.
  41. For example,
  42. .IR vac (1)
  43. uses a
  44. .B vac:
  45. prefix.
  46. .SS "Files and Directories
  47. Venti accepts blocks up to 56 kilobytes in size.
  48. By convention, Venti clients use hash trees of blocks to
  49. represent arbitrary-size data
  50. .IR files .
  51. The data to be stored is split into fixed-size
  52. blocks and written to the server, producing a list
  53. of scores.
  54. The resulting list of scores is split into fixed-size pointer
  55. blocks (using only an integral number of scores per block)
  56. and written to the server, producing a smaller list
  57. of scores.
  58. The process continues, eventually ending with the
  59. score for the hash tree's top-most block.
  60. Each file stored this way is summarized by
  61. a
  62. .B VtEntry
  63. structure recording the top-most score, the depth
  64. of the tree, the data block size, and the pointer block size.
  65. One or more
  66. .B VtEntry
  67. structures can be concatenated
  68. and stored as a special file called a
  69. .IR directory .
  70. In this
  71. manner, arbitrary trees of files can be constructed
  72. and stored.
  73. .PP
  74. Scores passed between programs conventionally refer
  75. to
  76. .B VtRoot
  77. blocks, which contain descriptive information
  78. as well as the score of a directory block containing a small number
  79. of directory entries.
  80. .PP
  81. Conventionally, programs do not mix data and directory entries
  82. in the same file. Instead, they keep two separate files, one with
  83. directory entries and one with metadata referencing those
  84. entries by position.
  85. Keeping this parallel representation is a minor annoyance
  86. but makes it possible for general programs like
  87. .I venti/copy
  88. (see
  89. .IR venti (1))
  90. to traverse the block tree without knowing the specific details
  91. of any particular program's data.
  92. .SS "Block Types
  93. To allow programs to traverse these structures without
  94. needing to understand their higher-level meanings,
  95. Venti tags each block with a type. The types are:
  96. .PP
  97. .nf
  98. .ft L
  99. VtDataType 000 \f1data\fL
  100. VtDataType+1 001 \fRscores of \fPVtDataType\fR blocks\fL
  101. VtDataType+2 002 \fRscores of \fPVtDataType+1\fR blocks\fL
  102. \fR\&...\fL
  103. VtDirType 010 VtEntry\fR structures\fL
  104. VtDirType+1 011 \fRscores of \fLVtDirType\fR blocks\fL
  105. VtDirType+2 012 \fRscores of \fLVtDirType+1\fR blocks\fL
  106. \fR\&...\fL
  107. VtRootType 020 VtRoot\fR structure\fL
  108. .fi
  109. .PP
  110. The octal numbers listed are the type numbers used
  111. by the commands below.
  112. (For historical reasons, the type numbers used on
  113. disk and on the wire are different from the above.
  114. They do not distinguish
  115. .BI VtDataType+ n
  116. blocks from
  117. .BI VtDirType+ n
  118. blocks.)
  119. .SS "Zero Truncation
  120. To avoid storing the same short data blocks padded with
  121. differing numbers of zeros, Venti clients working with fixed-size
  122. blocks conventionally
  123. `zero truncate' the blocks before writing them to the server.
  124. For example, if a 1024-byte data block contains the
  125. 11-byte string
  126. .RB ` hello " " world '
  127. followed by 1013 zero bytes,
  128. a client would store only the 11-byte block.
  129. When the client later read the block from the server,
  130. it would append zero bytes to the end as necessary to
  131. reach the expected size.
  132. .PP
  133. When truncating pointer blocks
  134. .RB ( VtDataType+ \fIn
  135. and
  136. .BI VtDirType+ n
  137. blocks),
  138. trailing zero scores are removed
  139. instead of trailing zero bytes.
  140. .PP
  141. Because of the truncation convention,
  142. any file consisting entirely of zero bytes,
  143. no matter what its length, will be represented by the zero score:
  144. the data blocks contain all zeros and are thus truncated
  145. to the empty block, and the pointer blocks contain all zero scores
  146. and are thus also truncated to the empty block,
  147. and so on up the hash tree.
  148. .SS Network Protocol
  149. A Venti session begins when a
  150. .I client
  151. connects to the network address served by a Venti
  152. .IR server ;
  153. the conventional address is
  154. .BI tcp! server !venti
  155. (the
  156. .B venti
  157. port is 17034).
  158. Both client and server begin by sending a version
  159. string of the form
  160. .BI venti- versions - comment \en \fR.
  161. The
  162. .I versions
  163. field is a list of acceptable versions separated by
  164. colons.
  165. The protocol described here is version
  166. .BR 02 .
  167. The client is responsible for choosing a common
  168. version and sending it in the
  169. .B VtThello
  170. message, described below.
  171. .PP
  172. After the initial version exchange, the client transmits
  173. .I requests
  174. .RI ( T-messages )
  175. to the server, which subsequently returns
  176. .I replies
  177. .RI ( R-messages )
  178. to the client.
  179. The combined act of transmitting (receiving) a request
  180. of a particular type, and receiving (transmitting) its reply
  181. is called a
  182. .I transaction
  183. of that type.
  184. .PP
  185. Each message consists of a sequence of bytes.
  186. Two-byte fields hold unsigned integers represented
  187. in big-endian order (most significant byte first).
  188. Data items of variable lengths are represented by
  189. a one-byte field specifying a count,
  190. .IR n ,
  191. followed by
  192. .I n
  193. bytes of data.
  194. Text strings are represented similarly,
  195. using a two-byte count with
  196. the text itself stored as a UTF-encoded sequence
  197. of Unicode characters (see
  198. .IR utf (6)).
  199. Text strings are not
  200. .SM NUL\c
  201. -terminated:
  202. .I n
  203. counts the bytes of UTF data, which include no final
  204. zero byte.
  205. The
  206. .SM NUL
  207. character is illegal in text strings in the Venti protocol.
  208. The maximum string length in Venti is 1024 bytes.
  209. .PP
  210. Each Venti message begins with a two-byte size field
  211. specifying the length in bytes of the message,
  212. not including the length field itself.
  213. The next byte is the message type, one of the constants
  214. in the enumeration in the include file
  215. .BR <venti.h> .
  216. The next byte is an identifying
  217. .IR tag ,
  218. used to match responses to requests.
  219. The remaining bytes are parameters of different sizes.
  220. In the message descriptions, the number of bytes in a field
  221. is given in brackets after the field name.
  222. The notation
  223. .IR parameter [ n ]
  224. where
  225. .I n
  226. is not a constant represents a variable-length parameter:
  227. .IR n [1]
  228. followed by
  229. .I n
  230. bytes of data forming the
  231. .IR parameter .
  232. The notation
  233. .IR string [ s ]
  234. (using a literal
  235. .I s
  236. character)
  237. is shorthand for
  238. .IR s [2]
  239. followed by
  240. .I s
  241. bytes of UTF-8 text.
  242. The notation
  243. .IR parameter []
  244. where
  245. .I parameter
  246. is the last field in the message represents a
  247. variable-length field that comprises all remaining
  248. bytes in the message.
  249. .PP
  250. All Venti RPC messages are prefixed with a field
  251. .IR size [2]
  252. giving the length of the message that follows
  253. (not including the
  254. .I size
  255. field itself).
  256. The message bodies are:
  257. .ta \w'\fLVtTgoodbye 'u
  258. .IP
  259. .ne 2v
  260. .B VtThello
  261. .IR tag [1]
  262. .IR version [ s ]
  263. .IR uid [ s ]
  264. .IR strength [1]
  265. .IR crypto [ n ]
  266. .IR codec [ n ]
  267. .br
  268. .B VtRhello
  269. .IR tag [1]
  270. .IR sid [ s ]
  271. .IR rcrypto [1]
  272. .IR rcodec [1]
  273. .IP
  274. .ne 2v
  275. .B VtTping
  276. .IR tag [1]
  277. .br
  278. .B VtRping
  279. .IR tag [1]
  280. .IP
  281. .ne 2v
  282. .B VtTread
  283. .IR tag [1]
  284. .IR score [20]
  285. .IR type [1]
  286. .IR pad [1]
  287. .IR count [2]
  288. .br
  289. .B VtRread
  290. .IR tag [1]
  291. .IR data []
  292. .IP
  293. .ne 2v
  294. .B VtTwrite
  295. .IR tag [1]
  296. .IR type [1]
  297. .IR pad [3]
  298. .IR data []
  299. .br
  300. .B VtRwrite
  301. .IR tag [1]
  302. .IR score [20]
  303. .IP
  304. .ne 2v
  305. .B VtTsync
  306. .IR tag [1]
  307. .br
  308. .B VtRsync
  309. .IR tag [1]
  310. .IP
  311. .ne 2v
  312. .B VtRerror
  313. .IR tag [1]
  314. .IR error [ s ]
  315. .IP
  316. .ne 2v
  317. .B VtTgoodbye
  318. .IR tag [1]
  319. .PP
  320. Each T-message has a one-byte
  321. .I tag
  322. field, chosen and used by the client to identify the message.
  323. The server will echo the request's
  324. .I tag
  325. field in the reply.
  326. Clients should arrange that no two outstanding
  327. messages have the same tag field so that responses
  328. can be distinguished.
  329. .PP
  330. The type of an R-message will either be one greater than
  331. the type of the corresponding T-message or
  332. .BR Rerror ,
  333. indicating that the request failed.
  334. In the latter case, the
  335. .I error
  336. field contains a string describing the reason for failure.
  337. .PP
  338. Venti connections must begin with a
  339. .B hello
  340. transaction.
  341. The
  342. .B VtThello
  343. message contains the protocol
  344. .I version
  345. that the client has chosen to use.
  346. The fields
  347. .IR strength ,
  348. .IR crypto ,
  349. and
  350. .IR codec
  351. could be used to add authentication, encryption,
  352. and compression to the Venti session
  353. but are currently ignored.
  354. The
  355. .IR rcrypto ,
  356. and
  357. .I rcodec
  358. fields in the
  359. .B VtRhello
  360. response are similarly ignored.
  361. The
  362. .IR uid
  363. and
  364. .IR sid
  365. fields are intended to be the identity
  366. of the client and server but, given the lack of
  367. authentication, should be treated only as advisory.
  368. The initial
  369. .B hello
  370. should be the only
  371. .B hello
  372. transaction during the session.
  373. .PP
  374. The
  375. .B ping
  376. message has no effect and
  377. is used mainly for debugging.
  378. Servers should respond immediately to pings.
  379. .PP
  380. The
  381. .B read
  382. message requests a block with the given
  383. .I score
  384. and
  385. .IR type .
  386. Use
  387. .I vttodisktype
  388. and
  389. .I vtfromdisktype
  390. (see
  391. .IR venti (2))
  392. to convert a block type enumeration value
  393. .RB ( VtDataType ,
  394. etc.)
  395. to the
  396. .I type
  397. used on disk and in the protocol.
  398. The
  399. .I count
  400. field specifies the maximum expected size
  401. of the block.
  402. The
  403. .I data
  404. in the reply is the block's contents.
  405. .PP
  406. The
  407. .B write
  408. message writes a new block of the given
  409. .I type
  410. with contents
  411. .I data
  412. to the server.
  413. The response includes the
  414. .I score
  415. to use to read the block,
  416. which should be the SHA1 hash of
  417. .IR data .
  418. .PP
  419. The Venti server may buffer written blocks in memory,
  420. waiting until after responding to the
  421. .B write
  422. message before writing them to
  423. permanent storage.
  424. The server will delay the response to a
  425. .B sync
  426. message until after all blocks in earlier
  427. .B write
  428. messages have been written to permanent storage.
  429. .PP
  430. The
  431. .B goodbye
  432. message ends a session. There is no
  433. .BR VtRgoodbye :
  434. upon receiving the
  435. .BR VtTgoodbye
  436. message, the server terminates up the connection.
  437. .SH SEE ALSO
  438. .IR venti (1),
  439. .IR venti (2),
  440. .IR venti (8)
  441. .br
  442. Sean Quinlan and Sean Dorward,
  443. ``Venti: a new approach to archival storage'',
  444. .I "Usenix Conference on File and Storage Technologies" ,
  445. 2002.