p6 5.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255
  1. .SH
  2. Cache/WORM Driver
  3. .PP
  4. The cache/WORM (cw) driver is by far the
  5. largest and most complicated device driver in the file server.
  6. There are four devices involved in the cw driver.
  7. It implements a read/write pseudo-device (the cw-device)
  8. and a read-only pseudo-device (the dump device)
  9. by performing operations on its two constituent devices
  10. the read-write c-device and the write-once-read-many
  11. w-device.
  12. The block numbers on the four devices are distinct,
  13. although the
  14. .I cw
  15. addresses,
  16. dump addresses,
  17. and the
  18. .I w
  19. addresses are
  20. highly correlated.
  21. .PP
  22. The cw-driver uses the w-device as the
  23. stable storage of the file system at the time of the
  24. last dump.
  25. All newly written and a large number of recently used
  26. exact copies of blocks of the w-device are kept on the c-device.
  27. The c-device is much smaller than the w-device and
  28. so the subset of w-blocks that are kept on the c-device are
  29. mapped through a hash table kept on a partition of the c-device.
  30. .PP
  31. The map portion of the c-device consists of blocks of buckets of entries.
  32. The declarations follow.
  33. .Ex
  34. enum
  35. {
  36. BKPERBLK = 10,
  37. CEPERBK = (BUFSIZE - BKPERBLK*sizeof(Off)) /
  38. (sizeof(Centry)*BKPERBLK),
  39. };
  40. .Ee
  41. .Ex
  42. typedef
  43. struct
  44. {
  45. ushort age;
  46. short state;
  47. Off waddr;
  48. } Centry;
  49. .Ee
  50. .Ex
  51. typedef
  52. struct
  53. {
  54. long agegen;
  55. Centry entry[CEPERBK];
  56. } Bucket;
  57. .Ee
  58. .Ex
  59. Bucket bucket[BKPERBLK];
  60. .Ee
  61. There is exactly one entry structure for each block in the
  62. data partition of the c-device.
  63. A bucket contains all of the w-addresses that have
  64. the same hash code.
  65. There are as many buckets as will fit
  66. in a block and enough blocks to have the required
  67. number of entries.
  68. The entries in the bucket are maintained
  69. in FIFO order with an age variable and an incrementing age generator.
  70. When the age generator is about to overflow,
  71. all of the ages in the bucket are rescaled
  72. from zero.
  73. .PP
  74. The following steps go into converting a w-address into a c-address.
  75. The bucket is found by
  76. .Ex
  77. bucket_number = w-address % total_buckets;
  78. getbuf(c-device, bucket_offset + bucket_number/BKPERBLK);
  79. .Ee
  80. After the desired bucket is found,
  81. the desired entry is found by a linear search within the bucket for the
  82. entry with the desired
  83. .CW waddr .
  84. .PP
  85. The state variable in the entry is
  86. one of the following.
  87. .Ex
  88. enum
  89. {
  90. Cnone = 0,
  91. Cdirty,
  92. Cdump,
  93. Cread,
  94. Cwrite,
  95. Cdump1,
  96. };
  97. .Ee
  98. Every w-address has a state.
  99. Blocks that are not in the
  100. c-device have the implied
  101. state
  102. .CW Cnone .
  103. The
  104. .CW Cread
  105. state is for blocks that have the
  106. same data as the corresponding block in
  107. the w-device.
  108. Since the c-device is much faster than the
  109. w-device,
  110. .CW Cread
  111. blocks are kept as long as possible and
  112. used in preference to reading the w-device.
  113. .CW Cread
  114. blocks may be discarded from the c-device
  115. when the space is needed for newer data.
  116. The
  117. .CW Cwrite
  118. state is when the c-device contains newer data
  119. than the corresponding block on the w-device.
  120. This happens when a
  121. .CW Cnone ,
  122. .CW Cread ,
  123. or
  124. .CW Cwrite
  125. block is written.
  126. The
  127. .CW Cdirty
  128. state
  129. is when the c-device contains
  130. new data and the corresponding block
  131. on the w-device has never been written.
  132. This happens when a new block has been
  133. allocated from the free space on the w-device.
  134. .PP
  135. The
  136. .CW Cwrite
  137. and
  138. .CW Cdirty
  139. blocks are created and never removed.
  140. Unless something is done to
  141. convert these blocks,
  142. the c-device will gradually
  143. fill up and stop functioning.
  144. Once a day,
  145. or by command,
  146. a
  147. .I dump
  148. of the cw-device
  149. is taken.
  150. The purpose of
  151. a dump is to queue the writes that
  152. have been shunted to the c-device
  153. to be written to the w-device.
  154. Since the w-device is a WORM,
  155. blocks cannot be rewritten.
  156. Blocks that have already been written to the WORM must be
  157. relocated to the unused portion of the w-device.
  158. These are precisely the
  159. blocks with
  160. .CW Cwrite
  161. state.
  162. .PP
  163. The dump algorithm is as follows:
  164. .IP a)
  165. The tree on the cw-device is walked
  166. as long as the blocks visited have been
  167. modified since the last dump.
  168. These are the blocks with state
  169. .CW Cwrite
  170. and
  171. .CW Cdirty .
  172. It is possible to restrict the search
  173. to within these blocks
  174. since the directory containing a modified
  175. file must have been accessed to modify the
  176. file and accessing a directory will set its
  177. modified time thus causing the block containing it
  178. to be written.
  179. The directory containing that directory must be
  180. modified for the same reason.
  181. The tree walk is thus drastically restrained and the
  182. tree walk does not take much time.
  183. .IP b)
  184. All
  185. .CW Cwrite
  186. blocks found in the tree search
  187. are relocated to new blank blocks on the w-device
  188. and converted to
  189. .CW Cdump
  190. state.
  191. All
  192. .CW Cdirty
  193. blocks are converted to
  194. .CW Cdump
  195. state without relocation.
  196. At this point,
  197. all modified blocks in the cw-device
  198. have w-addresses that point to unwritten
  199. WORM blocks.
  200. These blocks are marked for later
  201. writing to the w-device
  202. with the state
  203. .CW Cdump .
  204. .IP c)
  205. All open files that were pointing to modified
  206. blocks are reopened to point at the corresponding
  207. reallocated blocks.
  208. This causes the directories leading to the
  209. open files to be modified.
  210. Thus the invariant discussed in a) is maintained.
  211. .IP d)
  212. The background dumping process will slowly
  213. go through the map of the c-device and write out
  214. all blocks with
  215. .CW Cdump
  216. state.
  217. .PP
  218. The dump takes a few minutes to walk the tree
  219. and mark the blocks.
  220. It can take hours to write the marked blocks
  221. to the WORM.
  222. If a marked block is rewritten before the old
  223. copy has been written to the WORM,
  224. it must be forced to the WORM before it is rewritten.
  225. There is no problem if another dump is taken before the first one
  226. is finished.
  227. The newly marked blocks are just added to the marked blocks
  228. left from the first dump.
  229. .PP
  230. If there is an error writing a marked block
  231. to the WORM
  232. then the
  233. .CW dump
  234. state is converted to
  235. .CW Cdump1
  236. and manual intervention is needed.
  237. (See the
  238. .CW cwcmd
  239. .CW mvstate
  240. command in
  241. .I fs (8)).
  242. These blocks can be disposed of by converting
  243. their state back to
  244. .CW Cdump
  245. so that they will be written again.
  246. They can also be converted to
  247. .CW Cwrite
  248. state so that they will be allocated new
  249. addresses at the next dump.
  250. In most other respects,
  251. a
  252. .CW Cdump1
  253. block behaves like a
  254. .CW Cwrite
  255. block.