fs.html 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837
  1. <html>
  2. <title>
  3. data
  4. </title>
  5. <body BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#330088" ALINK="#FF0044">
  6. <H1>The Plan 9 File Server
  7. </H1>
  8. <DL><DD><I>Ken Thompson<br>
  9. ken@plan9.bell-labs.com<br>
  10. </I></DL>
  11. <DL><DD><H4>ABSTRACT</H4>
  12. This paper describes the structure
  13. and the operation of Plan 9 file servers.
  14. The specifics apply to
  15. our main Plan 9 file server
  16. Emelie,
  17. but
  18. the code is also the basis for
  19. the user level file server
  20. <TT>kfs</TT>.
  21. </DL>
  22. <H4>Introduction
  23. </H4>
  24. <P>
  25. The Plan 9 file server
  26. Emelie
  27. is the oldest piece of system software
  28. still in use on Plan 9.
  29. It evolved from a user-level program that served
  30. serial lines on a Sequent multi-processor.
  31. The current implementation is neither clean nor
  32. portable,
  33. but it has slowly come to terms with
  34. its particular set of cranky computers
  35. and devices.
  36. </P>
  37. <H4>Process Structure
  38. </H4>
  39. <P>
  40. The Plan 9 file system server is made from
  41. an ancient version of the Plan 9 kernel.
  42. The kernel contains process control,
  43. synchronization,
  44. locks,
  45. and some memory
  46. allocation.
  47. The kernel has no user processes or
  48. virtual memory.
  49. </P>
  50. <P>
  51. The structure of the file system server
  52. is a set of kernel processes
  53. synchronizing mostly through message passing.
  54. In Emelie there are 26 processes of 10 types:
  55. <DL><DT><DD><TT><PRE>
  56. number name function
  57. 15 <TT>srv</TT> Main file system server processes
  58. 1 <TT>rah</TT> Block read-ahead processes
  59. h'w'0'u'1 <TT>scp</TT> Sync process
  60. h'w'0'u'1 <TT>wcp</TT> WORM copy process
  61. h'w'0'u'1 <TT>con</TT> Console process
  62. h'w'0'u'1 <TT>ilo</TT> IL protocol process
  63. h'w'0'u'1 <TT>ilt</TT> IL timer process
  64. h'w'0'u'2 <TT>ethi</TT> Ethernet input process
  65. h'w'0'u'2 <TT>etho</TT> Ethernet output process
  66. h'w'0'u'1 <TT>flo</TT> Floppy disk process
  67. </PRE></TT></DL>
  68. </P>
  69. <H4>The server processes
  70. </H4>
  71. <P>
  72. The main file system algorithm is a set
  73. of identical processes
  74. named
  75. <TT>srv</TT>
  76. that honor the
  77. 9P protocol.
  78. Each file system process waits on
  79. a message queue for an incoming request.
  80. The request contains a 9P message and
  81. the address of a reply queue.
  82. A
  83. <TT>srv</TT>
  84. process parses the message,
  85. performs pseudo-disk I/O
  86. to the corresponding file system block device,
  87. formulates a response,
  88. and sends the
  89. response back to the reply queue.
  90. </P>
  91. <P>
  92. The unit of storage is a
  93. block of data on a device:
  94. <DL><DT><DD><TT><PRE>
  95. enum
  96. {
  97. RBUFSIZE = 16*1024
  98. };
  99. typedef
  100. struct
  101. {
  102. short pad;
  103. short tag;
  104. long path;
  105. } Tag;
  106. enum
  107. {
  108. BUFSIZE = RBUFSIZE - sizeof(Tag)
  109. };
  110. typedef
  111. struct
  112. {
  113. uchar data[BUFSIZE];
  114. Tag tag;
  115. } Block;
  116. </PRE></TT></DL>
  117. All devices are idealized as a perfect disk
  118. of contiguously numbered blocks each of size
  119. <TT>RBUFSIZE</TT>.
  120. Each block has a tag that identifies what type
  121. of block it is and a unique id of the file or directory
  122. where this block resides.
  123. The remaining data in the block depends on
  124. what type of block it is.
  125. </P>
  126. <P>
  127. The
  128. <TT>srv</TT>
  129. process's main data structure is the directory entry.
  130. This is the equivalent of a UNIX i-node and
  131. defines the set of block addresses that comprise a file or directory.
  132. Unlike the i-node,
  133. the directory entry also has the name of the
  134. file or directory in it:
  135. <DL><DT><DD><TT><PRE>
  136. enum
  137. {
  138. NAMELEN = 28,
  139. NDBLOCK = 6
  140. };
  141. </PRE></TT></DL>
  142. <DL><DT><DD><TT><PRE>
  143. typedef
  144. struct
  145. {
  146. char name[NAMELEN];
  147. short uid;
  148. short gid;
  149. ushort mode;
  150. short wuid;
  151. Qid qid;
  152. long size;
  153. long dblock[NDBLOCK];
  154. long iblock;
  155. long diblock;
  156. long atime;
  157. long mtime;
  158. } Dentry;
  159. </PRE></TT></DL>
  160. Each directory entry holds the file or directory
  161. name, protection mode, access times, user-id, group-id, and addressing
  162. information.
  163. The entry
  164. <TT>wuid</TT>
  165. is the user-id of the last writer of the file
  166. and
  167. <TT>size</TT>
  168. is the size of the file in bytes.
  169. The first 6
  170. blocks of the file are held in the
  171. <TT>dblock</TT>
  172. array.
  173. If the file is larger than that,
  174. an indirect block is allocated that holds
  175. the next
  176. <TT>BUFSIZE/sizeof(long)</TT>
  177. blocks of the file.
  178. The indirect block address is held in the structure member
  179. <TT>iblock</TT>.
  180. If the file is larger yet,
  181. then there is a double indirect block that points
  182. at indirect blocks.
  183. The double indirect address is held in
  184. <TT>diblock</TT>
  185. and can point at another
  186. <TT>(BUFSIZE/sizeof(long))<sup>2</sup></TT>
  187. blocks of data.
  188. The maximum addressable size of a file is
  189. therefore 275 Gbytes.
  190. There is a tighter restriction of
  191. 2<sup>32</sup>
  192. bytes because the length of a file is maintained in
  193. a long.
  194. Even so,
  195. sloppy use of long arithmetic restricts the length to
  196. 2<sup>31</sup>
  197. bytes.
  198. These numbers are based on Emelie
  199. which has a block size of 16K and
  200. <TT>sizeof(long)</TT>
  201. is 4.
  202. It would be different if the size of a block
  203. changed.
  204. </P>
  205. <P>
  206. The declarations of the indirect and double indirect blocks
  207. are as follows.
  208. <DL><DT><DD><TT><PRE>
  209. enum
  210. {
  211. INDPERBUF = BUFSIZE/sizeof(long),
  212. };
  213. </PRE></TT></DL>
  214. <DL><DT><DD><TT><PRE>
  215. typedef
  216. {
  217. long dblock[INDPERBUF];
  218. Tag ibtag;
  219. } Iblock;
  220. </PRE></TT></DL>
  221. <DL><DT><DD><TT><PRE>
  222. typedef
  223. {
  224. long iblock[INDPERBUF];
  225. Tag dibtag;
  226. } Diblock;
  227. </PRE></TT></DL>
  228. </P>
  229. <P>
  230. The root of a file system is a single directory entry
  231. at a known block address.
  232. A directory is a file that consists of a list of
  233. directory entries.
  234. To make access easier,
  235. a directory entry cannot cross blocks.
  236. In Emelie there are 233 directory entries per block.
  237. </P>
  238. <P>
  239. The device on which the blocks reside is implicit
  240. and ultimately comes from the 9P
  241. <TT>attach</TT>
  242. message that specifies the name of the
  243. device containing the root.
  244. </P>
  245. <H4>Buffer Cache
  246. </H4>
  247. <P>
  248. When the file server is
  249. booted,
  250. all of the unused memory is allocated to
  251. a block buffer pool.
  252. There are two major operations on the buffer
  253. pool.
  254. <TT>Getbuf</TT>
  255. will find the buffer associated with a
  256. particular block on a particular device.
  257. The returned buffer is locked so that the
  258. caller has exclusive use.
  259. If the requested buffer is not in the pool,
  260. some other buffer will be relabeled and
  261. the data will be read from the requested device.
  262. <TT>Putbuf</TT>
  263. will unlock a buffer and
  264. if the contents are marked as modified,
  265. the buffer will be written to the device before
  266. the buffer is relabeled.
  267. If there is some special mapping
  268. or CPU cache flushing
  269. that must occur in order for the physical I/O
  270. device to access the buffers,
  271. this is done between
  272. <TT>getbuf</TT>
  273. and
  274. <TT>putbuf</TT>.
  275. The contents of a buffer is never touched
  276. except while it is locked between
  277. <TT>getbuf</TT>
  278. and
  279. <TT>putbuf</TT>
  280. calls.
  281. </P>
  282. <P>
  283. The
  284. file system server processes
  285. prevent deadlock in the buffers by
  286. always locking parent and child
  287. directory entries in that order.
  288. Since the entire directory structure
  289. is a hierarchy,
  290. this makes the locking well-ordered,
  291. preventing deadlock.
  292. The major problem in the locking strategy is
  293. that locks are at a block level and there are many
  294. directory entries in a single block.
  295. There are unnecessary lock conflicts
  296. in the directory blocks.
  297. When one of these directory blocks is tied up
  298. accessing the very slow WORM,
  299. then all I/O to dozens of unrelated directories
  300. is blocked.
  301. </P>
  302. <H4>Block Devices
  303. </H4>
  304. <P>
  305. The block device I/O system is like a
  306. protocol stack of filters.
  307. There are a set of pseudo-devices that call
  308. recursively to other pseudo-devices and real devices.
  309. The protocol stack is compiled from a configuration
  310. string that specifies the order of pseudo-devices and devices.
  311. Each pseudo-device and device has a set of entry points
  312. that corresponds to the operations that the file system
  313. requires of a device.
  314. The most notable operations are
  315. <TT>read</TT>,
  316. <TT>write</TT>,
  317. and
  318. <TT>size</TT>.
  319. </P>
  320. <P>
  321. The device stack can best be described by
  322. describing the syntax of the configuration string
  323. that specifies the stack.
  324. Configuration strings are used
  325. during the setup of the file system.
  326. For a description see
  327. <A href="/magic/man2html/8/fsconfig"><I>fsconfig</I>(8).
  328. </A>In the following recursive definition,
  329. <I>D</I>
  330. represents a
  331. string that specifies a block device.
  332. </P>
  333. <DL COMPACT>
  334. <DT><I>D</I> = (<I>DD</I>...)<DD>
  335. <br>
  336. This is a set of devices that
  337. are concatenated to form a single device.
  338. The size of the catenated device is the
  339. sum of the sizes of each sub-device.
  340. <DT><I>D</I> = [<I>DD</I>...]<DD>
  341. <br>
  342. This is the interleaving of the
  343. individual devices.
  344. If there are N devices in the list,
  345. then the pseudo-device is the N-way block
  346. interleaving of the sub-devices.
  347. The size of the interleaved device is
  348. N times the size of the smallest sub-device.
  349. <DT><I>D</I> = <TT>p</TT><I>DN1.N2</I><DD>
  350. <br>
  351. This is a partition of a sub-device.
  352. The sub-device is partitioned into 100 equal pieces.
  353. If the size of the sub-device is not divisible by 100,
  354. then there will be some slop thrown away at the top.
  355. The pseudo-device starts at the N1-th piece and
  356. continues for N2 pieces. Thus
  357. <TT>p<I>D</I>67.33</TT>
  358. will be the
  359. last third of the device
  360. <I>D</I>.
  361. <DT><I>D</I> = <TT>f</TT><I>D</I><DD>
  362. <br>
  363. This is a fake write-once-read-many device simulated by a
  364. second read-write device.
  365. This second device is partitioned
  366. into a set of block flags and a set of blocks.
  367. The flags are used to generate errors if a
  368. block is ever written twice or read without being written first.
  369. <DT><I>D</I> = <TT>c</TT><I>DD</I><DD>
  370. <br>
  371. This is the cache/WORM device made up of a cache (read-write)
  372. device and a WORM (write-once-read-many) device.
  373. More on this later.
  374. <DT><I>D</I> = <TT>o</TT><DD>
  375. <br>
  376. This is the dump file system that is the
  377. two-level hierarchy of all dumps ever taken on a cache/WORM.
  378. The read-only root of the cache/WORM file system
  379. (on the dump taken Feb 18, 1995) can
  380. be referenced as
  381. <TT>/1995/0218</TT>
  382. in this pseudo device.
  383. The second dump taken that day will be
  384. <TT>/1995/02181</TT>.
  385. <DT><I>D</I> = <TT>w</TT><I>N1.N2</I><DD>
  386. <br>
  387. This is a SCSI disk on controller N1 and target N2.
  388. <DT><I>D</I> = <TT>l</TT><I>N1.N2</I><DD>
  389. <br>
  390. This is the same as
  391. <TT>w</TT>,
  392. but one block from the SCSI disk is removed for labeling.
  393. <DT><I>D</I> = <TT>j(</TT><I>D<sub>1</sub></I><I>D<sub>2</sub></I><TT>*)</TT><I>D<sub>3</sub></I><DD>
  394. <br>
  395. <I>D<sub>1</sub></I>
  396. is the juke box SCSI interface.
  397. The
  398. <I>D<sub>2</sub></I>'s
  399. are the SCSI drives in the juke box
  400. and the
  401. <I>D<sub>3</sub></I>'s
  402. are the demountable platters in the juke box.
  403. <I>D<sub>1</sub></I>
  404. and
  405. <I>D<sub>2</sub></I>
  406. must be
  407. <TT>w</TT>.
  408. <I>D<sub>3</sub></I>
  409. must be pseudo devices of
  410. <TT>w</TT>
  411. or
  412. <TT>l</TT>
  413. devices.
  414. </dl>
  415. <P>
  416. For both
  417. <TT>w</TT>
  418. and
  419. <TT>r</TT>
  420. devices any of the configuration numbers
  421. can be replaced by an iterator of the form
  422. <TT><<I>N1-N2</I>></TT>.
  423. Thus
  424. <DL><DT><DD><TT><PRE>
  425. [w0.&#60;2-6&#62;]
  426. </PRE></TT></DL>
  427. is the interleaved SCSI disks on SCSI targets
  428. 2 through 6 of SCSI controller 0.
  429. The main file system on
  430. Emelie
  431. is defined by the configuration string
  432. <DL><DT><DD><TT><PRE>
  433. c[w1.&#60;0-5&#62;.0]j(w6w5w4w3w2)l(&#60;0-236&#62;l&#60;238-474&#62;)
  434. </PRE></TT></DL>
  435. This is a cache/WORM driver.
  436. The cache is three interleaved disks on SCSI controller 1
  437. targets 0, 1, 2, 3, 4, and 5.
  438. The WORM half of the cache/WORM
  439. is 474 jukebox disks.
  440. </P>
  441. <H4>The read-ahead processes
  442. </H4>
  443. <P>
  444. There are a set of file system processes,
  445. <TT>rah</TT>,
  446. that wait for messages consisting of a device and block
  447. address.
  448. When a message comes in,
  449. the process reads the specified block from the device.
  450. This is done by calling
  451. <TT>getbuf</TT>
  452. and
  453. <TT>putbuf</TT>.
  454. The purpose of this is the hope that these blocks
  455. will be used later and that they will reside in the
  456. buffer cache long enough not to be discarded before
  457. they are used.
  458. </P>
  459. <P>
  460. The messages to the read-ahead processes are
  461. generated by the server processes.
  462. The server processes maintain a relative block mark in every
  463. open file.
  464. Whenever an open file reads that relative block,
  465. the next 110 block addresses of the file are sent
  466. to the read-ahead processes and
  467. the relative block mark is advanced by 100.
  468. The initial relative block is set to 1.
  469. If the file is opened and
  470. only a few bytes are read,
  471. then no anticipating reads are performed
  472. since the relative block mark is set to 1
  473. and only block offset 0 is read.
  474. This is to prevent some
  475. fairly common action such as
  476. <DL><DT><DD><TT><PRE>
  477. file *
  478. </PRE></TT></DL>
  479. from swamping the file system with read-ahead
  480. requests that will never be used.
  481. </P>
  482. <H4>Cache/WORM Driver
  483. </H4>
  484. <P>
  485. The cache/WORM (cw) driver is by far the
  486. largest and most complicated device driver in the file server.
  487. There are four devices involved in the cw driver.
  488. It implements a read/write pseudo-device (the cw-device)
  489. and a read-only pseudo-device (the dump device)
  490. by performing operations on its two constituent devices
  491. the read-write c-device and the write-once-read-many
  492. w-device.
  493. The block numbers on the four devices are distinct,
  494. although the cw addresses,
  495. dump addresses,
  496. and the w addresses are
  497. highly correlated.
  498. </P>
  499. <P>
  500. The cw-driver uses the w-device as the
  501. stable storage of the file system at the time of the
  502. last dump.
  503. All newly written and a large number of recently used
  504. exact copies of blocks of the w-device are kept on the c-device.
  505. The c-device is much smaller than the w-device and
  506. so the subset of w-blocks that are kept on the c-device are
  507. mapped through a hash table kept on a partition of the c-device.
  508. </P>
  509. <P>
  510. The map portion of the c-device consists of blocks of buckets of entries.
  511. The declarations follow.
  512. <DL><DT><DD><TT><PRE>
  513. enum
  514. {
  515. BKPERBLK = 10,
  516. CEPERBK = (BUFSIZE - BKPERBLK*sizeof(long)) /
  517. (sizeof(Centry)*BKPERBLK),
  518. };
  519. </PRE></TT></DL>
  520. <DL><DT><DD><TT><PRE>
  521. typedef
  522. struct
  523. {
  524. ushort age;
  525. short state;
  526. long waddr;
  527. } Centry;
  528. </PRE></TT></DL>
  529. <DL><DT><DD><TT><PRE>
  530. typedef
  531. struct
  532. {
  533. long agegen;
  534. Centry entry[CEPERBK];
  535. } Bucket;
  536. </PRE></TT></DL>
  537. <DL><DT><DD><TT><PRE>
  538. Bucket bucket[BKPERBLK];
  539. </PRE></TT></DL>
  540. There is exactly one entry structure for each block in the
  541. data partition of the c-device.
  542. A bucket contains all of the w-addresses that have
  543. the same hash code.
  544. There are as many buckets as will fit
  545. in a block and enough blocks to have the required
  546. number of entries.
  547. The entries in the bucket are maintained
  548. in FIFO order with an age variable and an incrementing age generator.
  549. When the age generator is about to overflow,
  550. all of the ages in the bucket are rescaled
  551. from zero.
  552. </P>
  553. <P>
  554. The following steps go into converting a w-address into a c-address.
  555. The bucket is found by
  556. <DL><DT><DD><TT><PRE>
  557. bucket_number = w-address % total_buckets
  558. getbuf(c-device, bucket_offset + bucket_number/BKPERBLK);
  559. </PRE></TT></DL>
  560. After the desired bucket is found,
  561. the desired entry is found by a linear search within the bucket for the
  562. entry with the desired
  563. <TT>waddr</TT>.
  564. </P>
  565. <P>
  566. The state variable in the entry is
  567. one of the following.
  568. <DL><DT><DD><TT><PRE>
  569. enum
  570. {
  571. Cnone = 0,
  572. Cdirty,
  573. Cdump,
  574. Cread,
  575. Cwrite,
  576. Cdump1,
  577. };
  578. </PRE></TT></DL>
  579. Every w-address has a state.
  580. Blocks that are not in the
  581. c-device have the implied
  582. state
  583. <TT>Cnone</TT>.
  584. The
  585. <TT>Cread</TT>
  586. state is for blocks that have the
  587. same data as the corresponding block in
  588. the w-device.
  589. Since the c-device is much faster than the
  590. w-device,
  591. <TT>Cread</TT>
  592. blocks are kept as long as possible and
  593. used in preference to reading the w-device.
  594. <TT>Cread</TT>
  595. blocks may be discarded from the c-device
  596. when the space is needed for newer data.
  597. The
  598. <TT>Cwrite</TT>
  599. state is when the c-device contains newer data
  600. than the corresponding block on the w-device.
  601. This happens when a
  602. <TT>Cnone</TT>,
  603. <TT>Cread</TT>,
  604. or
  605. <TT>Cwrite</TT>
  606. block is written.
  607. The
  608. <TT>Cdirty</TT>
  609. state
  610. is when the c-device contains
  611. new data and the corresponding block
  612. on the w-device has never been written.
  613. This happens when a new block has been
  614. allocated from the free space on the w-device.
  615. </P>
  616. <P>
  617. The
  618. <TT>Cwrite</TT>
  619. and
  620. <TT>Cdirty</TT>
  621. blocks are created and never removed.
  622. Unless something is done to
  623. convert these blocks,
  624. the c-device will gradually
  625. fill up and stop functioning.
  626. Once a day,
  627. or by command,
  628. a
  629. <I>dump</I>
  630. of the cw-device
  631. is taken.
  632. The purpose of
  633. a dump is to queue the writes that
  634. have been shunted to the c-device
  635. to be written to the w-device.
  636. Since the w-device is a WORM,
  637. blocks cannot be rewritten.
  638. Blocks that have already been written to the WORM must be
  639. relocated to the unused portion of the w-device.
  640. These are precisely the
  641. blocks with
  642. <TT>Cwrite</TT>
  643. state.
  644. </P>
  645. <P>
  646. The dump algorithm is as follows:
  647. a) The tree on the cw-device is walked
  648. as long as the blocks visited have been
  649. modified since the last dump.
  650. These are the blocks with state
  651. <TT>Cwrite</TT>
  652. and
  653. <TT>Cdirty</TT>.
  654. It is possible to restrict the search
  655. to within these blocks
  656. since the directory containing a modified
  657. file must have been accessed to modify the
  658. file and accessing a directory will set its
  659. modified time thus causing the block containing it
  660. to be written.
  661. The directory containing that directory must be
  662. modified for the same reason.
  663. The tree walk is thus drastically restrained and the
  664. tree walk does not take much time.
  665. b) All
  666. <TT>Cwrite</TT>
  667. blocks found in the tree search
  668. are relocated to new blank blocks on the w-device
  669. and converted to
  670. <TT>Cdump</TT>
  671. state.
  672. All
  673. <TT>Cdirty</TT>
  674. blocks are converted to
  675. <TT>Cdump</TT>
  676. state without relocation.
  677. At this point,
  678. all modified blocks in the cw-device
  679. have w-addresses that point to unwritten
  680. WORM blocks.
  681. These blocks are marked for later
  682. writing to the w-device
  683. with the state
  684. <TT>Cdump</TT>.
  685. c) All open files that were pointing to modified
  686. blocks are reopened to point at the corresponding
  687. reallocated blocks.
  688. This causes the directories leading to the
  689. open files to be modified.
  690. Thus the invariant discussed in a) is maintained.
  691. d) The background dumping process will slowly
  692. go through the map of the c-device and write out
  693. all blocks with
  694. <TT>Cdump</TT>
  695. state.
  696. </P>
  697. <P>
  698. The dump takes a few minutes to walk the tree
  699. and mark the blocks.
  700. It can take hours to write the marked blocks
  701. to the WORM.
  702. If a marked block is rewritten before the old
  703. copy has been written to the WORM,
  704. it must be forced to the WORM before it is rewritten.
  705. There is no problem if another dump is taken before the first one
  706. is finished.
  707. The newly marked blocks are just added to the marked blocks
  708. left from the first dump.
  709. </P>
  710. <P>
  711. If there is an error writing a marked block
  712. to the WORM
  713. then the
  714. <TT>dump</TT>
  715. state is converted to
  716. <TT>Cdump1</TT>
  717. and manual intervention is needed.
  718. (See the
  719. <TT>cwcmd</TT>
  720. <TT>mvstate</TT>
  721. command in
  722. <A href="/magic/man2html/8/fs"><I>fs</I>(8)).
  723. </A>These blocks can be disposed of by converting
  724. their state back to
  725. <TT>Cdump</TT>
  726. so that they will be written again.
  727. They can also be converted to
  728. <TT>Cwrite</TT>
  729. state so that they will be allocated new
  730. addresses at the next dump.
  731. In most other respects,
  732. a
  733. <TT>Cdump1</TT>
  734. block behaves like a
  735. <TT>Cwrite</TT>
  736. block.
  737. </P>
  738. <H4>Sync Copy and WORM Copy Processes
  739. </H4>
  740. <P>
  741. The
  742. <TT>scp</TT>
  743. process
  744. wakes up every ten seconds and
  745. issues writes to blocks in the buffer cache
  746. that have been modified.
  747. This is done automatically on important
  748. console commands such as
  749. <TT>halt</TT>
  750. and
  751. <TT>dump</TT>.
  752. </P>
  753. <P>
  754. The
  755. <TT>wcp</TT>
  756. process also wakes up every ten seconds
  757. and tries to copy a
  758. <TT>dump</TT>
  759. block from the cache to the WORM.
  760. As long as there are
  761. <TT>dump</TT>
  762. blocks to copy and there is no competition for
  763. the WORM device,
  764. the copy will continue at full speed.
  765. Whenever there is competition for the WORM
  766. or there are no more blocks to
  767. copy,
  768. then the process will sleep ten seconds
  769. before looking again.
  770. </P>
  771. <P>
  772. The HP WORM jukebox consists of
  773. 238 disks divided into 476 sides
  774. or platters.
  775. Platter 0 is the
  776. <I>A</I>
  777. side of disk 0.
  778. Platter 1 is the
  779. <I>A</I>
  780. side of the disk 1.
  781. Platter 238 is the
  782. <I>B</I>
  783. side of disk 0.
  784. On Emelie,
  785. the main file system is configured
  786. on both sides of the first 237 disks,
  787. platters 0-236 and 238-474.
  788. </P>
  789. <H4>9P Protocol Drivers
  790. </H4>
  791. <P>
  792. The file server described so far
  793. waits for 9P protocol messages to
  794. appear in its input queue.
  795. It processes each message and
  796. sends the reply back to the originator.
  797. There are groups of processes that
  798. perform protocol I/O on some network or
  799. device and the resulting messages
  800. are sent to the file system queue.
  801. </P>
  802. <P>
  803. There are two sets of processes
  804. <TT>ethi</TT>
  805. and
  806. <TT>etho</TT>
  807. that perform Ethernet input and output on two different networks.
  808. These processes send Ethernet messages
  809. to/from two more processes
  810. <TT>ilo</TT>
  811. and
  812. <TT>ilt</TT>
  813. that do the IL reliable datagram protocol
  814. on top of IP packets.
  815. </P>
  816. <P>
  817. The last process in Emelie,
  818. <TT>con</TT>,
  819. reads the console
  820. and calls internal subroutines to
  821. executes commands typed.
  822. Since there is only one process,
  823. only one command can be executing at a
  824. time.
  825. See
  826. <A href="/magic/man2html/8/fs"><I>fs</I>(8)
  827. </A>for a description of the
  828. commands available at the console.
  829. </P>
  830. <br>&#32;<br>
  831. <A href=http://www.lucent.com/copyright.html>
  832. Copyright</A> &#169; 2000 Lucent Technologies Inc. All rights reserved.
  833. </body></html>