p6 6.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247
  1. .SH
  2. Cache/WORM Driver
  3. .PP
  4. The cache/WORM (cw) driver is by far the
  5. largest and most complicated device driver in the file server.
  6. There are four devices involved in the cw driver.
  7. It implements a read/write pseudo-device (the cw-device)
  8. and a read-only pseudo-device (the dump device)
  9. by performing operations on its two constituent devices
  10. the read-write c-device and the write-once-read-many
  11. w-device.
  12. The block numbers on the four devices are distinct,
  13. although the cw addresses,
  14. dump addresses,
  15. and the w addresses are
  16. highly correlated.
  17. .PP
  18. The cw-driver uses the w-device as the
  19. stable storage of the file system at the time of the
  20. last dump.
  21. All newly written and a large number of recently used
  22. exact copies of blocks of the w-device are kept on the c-device.
  23. The c-device is much smaller than the w-device and
  24. so the subset of w-blocks that are kept on the c-device are
  25. mapped through a hash table kept on a partition of the c-device.
  26. .PP
  27. The map portion of the c-device consists of blocks of buckets of entries.
  28. The declarations follow.
  29. .P1
  30. enum
  31. {
  32. BKPERBLK = 10,
  33. CEPERBK = (BUFSIZE - BKPERBLK*sizeof(long)) /
  34. (sizeof(Centry)*BKPERBLK),
  35. };
  36. .P2
  37. .P1
  38. typedef
  39. struct
  40. {
  41. ushort age;
  42. short state;
  43. long waddr;
  44. } Centry;
  45. .P2
  46. .P1
  47. typedef
  48. struct
  49. {
  50. long agegen;
  51. Centry entry[CEPERBK];
  52. } Bucket;
  53. .P2
  54. .P1
  55. Bucket bucket[BKPERBLK];
  56. .P2
  57. There is exactly one entry structure for each block in the
  58. data partition of the c-device.
  59. A bucket contains all of the w-addresses that have
  60. the same hash code.
  61. There are as many buckets as will fit
  62. in a block and enough blocks to have the required
  63. number of entries.
  64. The entries in the bucket are maintained
  65. in FIFO order with an age variable and an incrementing age generator.
  66. When the age generator is about to overflow,
  67. all of the ages in the bucket are rescaled
  68. from zero.
  69. .PP
  70. The following steps go into converting a w-address into a c-address.
  71. The bucket is found by
  72. .P1
  73. bucket_number = w-address % total_buckets
  74. getbuf(c-device, bucket_offset + bucket_number/BKPERBLK);
  75. .P2
  76. After the desired bucket is found,
  77. the desired entry is found by a linear search within the bucket for the
  78. entry with the desired
  79. .CW waddr .
  80. .PP
  81. The state variable in the entry is
  82. one of the following.
  83. .P1
  84. enum
  85. {
  86. Cnone = 0,
  87. Cdirty,
  88. Cdump,
  89. Cread,
  90. Cwrite,
  91. Cdump1,
  92. };
  93. .P2
  94. Every w-address has a state.
  95. Blocks that are not in the
  96. c-device have the implied
  97. state
  98. .CW Cnone .
  99. The
  100. .CW Cread
  101. state is for blocks that have the
  102. same data as the corresponding block in
  103. the w-device.
  104. Since the c-device is much faster than the
  105. w-device,
  106. .CW Cread
  107. blocks are kept as long as possible and
  108. used in preference to reading the w-device.
  109. .CW Cread
  110. blocks may be discarded from the c-device
  111. when the space is needed for newer data.
  112. The
  113. .CW Cwrite
  114. state is when the c-device contains newer data
  115. than the corresponding block on the w-device.
  116. This happens when a
  117. .CW Cnone ,
  118. .CW Cread ,
  119. or
  120. .CW Cwrite
  121. block is written.
  122. The
  123. .CW Cdirty
  124. state
  125. is when the c-device contains
  126. new data and the corresponding block
  127. on the w-device has never been written.
  128. This happens when a new block has been
  129. allocated from the free space on the w-device.
  130. .PP
  131. The
  132. .CW Cwrite
  133. and
  134. .CW Cdirty
  135. blocks are created and never removed.
  136. Unless something is done to
  137. convert these blocks,
  138. the c-device will gradually
  139. fill up and stop functioning.
  140. Once a day,
  141. or by command,
  142. a
  143. .I dump
  144. of the cw-device
  145. is taken.
  146. The purpose of
  147. a dump is to queue the writes that
  148. have been shunted to the c-device
  149. to be written to the w-device.
  150. Since the w-device is a WORM,
  151. blocks cannot be rewritten.
  152. Blocks that have already been written to the WORM must be
  153. relocated to the unused portion of the w-device.
  154. These are precisely the
  155. blocks with
  156. .CW Cwrite
  157. state.
  158. .PP
  159. The dump algorithm is as follows:
  160. a) The tree on the cw-device is walked
  161. as long as the blocks visited have been
  162. modified since the last dump.
  163. These are the blocks with state
  164. .CW Cwrite
  165. and
  166. .CW Cdirty .
  167. It is possible to restrict the search
  168. to within these blocks
  169. since the directory containing a modified
  170. file must have been accessed to modify the
  171. file and accessing a directory will set its
  172. modified time thus causing the block containing it
  173. to be written.
  174. The directory containing that directory must be
  175. modified for the same reason.
  176. The tree walk is thus drastically restrained and the
  177. tree walk does not take much time.
  178. b) All
  179. .CW Cwrite
  180. blocks found in the tree search
  181. are relocated to new blank blocks on the w-device
  182. and converted to
  183. .CW Cdump
  184. state.
  185. All
  186. .CW Cdirty
  187. blocks are converted to
  188. .CW Cdump
  189. state without relocation.
  190. At this point,
  191. all modified blocks in the cw-device
  192. have w-addresses that point to unwritten
  193. WORM blocks.
  194. These blocks are marked for later
  195. writing to the w-device
  196. with the state
  197. .CW Cdump .
  198. c) All open files that were pointing to modified
  199. blocks are reopened to point at the corresponding
  200. reallocated blocks.
  201. This causes the directories leading to the
  202. open files to be modified.
  203. Thus the invariant discussed in a) is maintained.
  204. d) The background dumping process will slowly
  205. go through the map of the c-device and write out
  206. all blocks with
  207. .CW Cdump
  208. state.
  209. .PP
  210. The dump takes a few minutes to walk the tree
  211. and mark the blocks.
  212. It can take hours to write the marked blocks
  213. to the WORM.
  214. If a marked block is rewritten before the old
  215. copy has been written to the WORM,
  216. it must be forced to the WORM before it is rewritten.
  217. There is no problem if another dump is taken before the first one
  218. is finished.
  219. The newly marked blocks are just added to the marked blocks
  220. left from the first dump.
  221. .PP
  222. If there is an error writing a marked block
  223. to the WORM
  224. then the
  225. .CW dump
  226. state is converted to
  227. .CW Cdump1
  228. and manual intervention is needed.
  229. (See the
  230. .CW cwcmd
  231. .CW mvstate
  232. command in
  233. .I fs (8)).
  234. These blocks can be disposed of by converting
  235. their state back to
  236. .CW Cdump
  237. so that they will be written again.
  238. They can also be converted to
  239. .CW Cwrite
  240. state so that they will be allocated new
  241. addresses at the next dump.
  242. In most other respects,
  243. a
  244. .CW Cdump1
  245. block behaves like a
  246. .CW Cwrite
  247. block.