notes 3.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149
  1. all data is big-endian on disk.
  2. arena layout:
  3. ArenaPart (first at offset PartBlank = 256kB in the disk file)
  4. magic[4] 0xA9E4A5E7
  5. version[4] 3
  6. blockSize[4]
  7. arenaBase[4] offset of first ArenaHead structure in the disk file
  8. the ArenaMap starts at the first block at offset >= PartBlank+512 bytes.
  9. it is a sequence of text lines
  10. /*
  11. * amap: n '\n' amapelem * n
  12. * n: u32int
  13. * amapelem: name '\t' astart '\t' asize '\n'
  14. * astart, asize: u64int
  15. */
  16. the astart and astop are byte offsets in the disk file.
  17. they are the offsets to the ArenaHead and the end of the Arena block.
  18. ArenaHead
  19. [base points here in the C code]
  20. size bytes
  21. Clumps
  22. ClumpInfo blocks
  23. Arena
  24. Arena
  25. magic[4] 0xF2A14EAD
  26. version[4] 4
  27. name[64]
  28. clumps[4]
  29. cclumps[4]
  30. ctime[4]
  31. wtime[4]
  32. used[8]
  33. uncsize[8]
  34. sealed[1]
  35. optional score[20]
  36. once sealed, the sha1 hash of every block from the
  37. ArenaHead to the Arena is checksummed, as though
  38. the final score in Arena were the zeroScore. strangely,
  39. the tail of the Arena block (the last one) is not included in the checksum
  40. (i.e., the unused data after the score).
  41. clumpMax = blocksize/ClumpInfoSize = blocksize/25
  42. dirsize = ((clumps/clumpMax)+1) * blocksize
  43. want used+dirsize <= size
  44. want cclumps <= clumps
  45. want uncsize+clumps*ClumpSize+blocksize < used
  46. want ctime <= wtime
  47. clump info is stored packed into blocks in order.
  48. clump info moves forward through a block but the
  49. blocks themselves move backwards. so if cm=clumpMax
  50. and there are two blocks worth of clumpinfo, the blocks
  51. look like;
  52. [cm..2*cm-1] [0..cm-1] [Arena]
  53. with the blocks pushed right up against the Arena trailer.
  54. ArenaHead
  55. magic[4] 0xD15C4EAD
  56. version[4] = Arena.version
  57. name[64]
  58. blockSize[4]
  59. size[8]
  60. Clump
  61. magic[4] 0xD15CB10C (0 for an unused clump)
  62. type[1]
  63. size[2]
  64. uncsize[2]
  65. score[20]
  66. encoding[1] raw=1, compress=2
  67. creator[4]
  68. time[4]
  69. ClumpInfo
  70. type[1]
  71. size[2]
  72. uncsize[2]
  73. score[20]
  74. the arenas are mapped into a single address space corresponding
  75. to the index that brings them together. if each arena has 100M bytes
  76. excluding the headers and there are 4 arenas, then there's 400M of
  77. index address space between them. index address space starts at 1M
  78. instead of 0, so the index addresses assigned to the first arena are
  79. 1M up to 101M, then 101M to 201M, etc.
  80. of course, the assignment of addresses has nothing to do with the index,
  81. but that's what they're called.
  82. the index is split into index sections, which are put on different disks
  83. to get parallelism of disk heads. each index section holds some number
  84. of hash buckets, each in its own disk block. collectively the index sections
  85. hold ix->buckets between them.
  86. the top 32-bits of the score is used to assign scores to buckets.
  87. div = ceil(2³² / ix->buckets) is the amount of 32-bit score space per bucket.
  88. to look up a block, take the top 32 bits of score and divide by div
  89. to get the bucket number. then look through the index section headers
  90. to figure out which index section has that bucket.
  91. then load that block from the index section. it's an IBucket.
  92. the IBucket has ib.n IEntry structures in it, sorted by score and then by type.
  93. do the lookup and get an IEntry. the ia.addr will be a logical address
  94. that you then use to get the
  95. ISect
  96. magic[4] 0xD15C5EC7
  97. version[4]
  98. name[64]
  99. index[64]
  100. blockSize[4]
  101. blockBase[4] address in partition where bucket blocks start
  102. blocks[4]
  103. start[4]
  104. stop[4] stop - start <= blocks, but not necessarily ==
  105. IEntry
  106. score[20]
  107. wtime[4]
  108. train[2]
  109. ia.addr[8] index address (see note above)
  110. ia.size[2] size of uncompressed block data
  111. ia.type[1]
  112. ia.blocks[1] number of blocks of clump on disk
  113. IBucket
  114. n[2]
  115. next[4] not sure; either 0 or inside [start,stop) for the ISect
  116. data[n*IEntrySize]
  117. final piece: all the disk partitions start with PartBlank=256kB of unused disk
  118. (presumably to avoid problems with boot sectors and layout tables
  119. and the like).
  120. actually the last 8k of the 256k (that is, at offset 248kB) can hold
  121. a venti config file to help during bootstrap of the venti file server.