p4 5.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194
  1. .SH
  2. Block Devices
  3. .PP
  4. The block device I/O system is like a
  5. protocol stack of filters.
  6. There are a set of pseudo-devices that call
  7. recursively to other pseudo-devices and real devices.
  8. The protocol stack is compiled from a configuration
  9. string that specifies the order of pseudo-devices and devices.
  10. Each pseudo-device and device has a set of entry points
  11. that corresponds to the operations that the file system
  12. requires of a device.
  13. The most notable operations are
  14. .CW read ,
  15. .CW write ,
  16. and
  17. .CW size .
  18. .PP
  19. The device stack can best be described by
  20. describing the syntax of the configuration string
  21. that specifies the stack.
  22. Configuration strings are used
  23. during the setup of the file system.
  24. For a description see
  25. .I fsconfig (8).
  26. In the following recursive definition,
  27. .I D
  28. represents a
  29. string that specifies a block device.
  30. .IP "\fID\fP = (\fIDD\fP...)"
  31. .br
  32. This is a set of devices that
  33. are concatenated to form a single device.
  34. The size of the catenated device is the
  35. sum of the sizes of each sub-device.
  36. .IP "\fID\fP = [\fIDD\fP...]"
  37. .br
  38. This is the interleaving of the
  39. individual devices.
  40. If there are N devices in the list,
  41. then the pseudo-device is the N-way block
  42. interleaving of the sub-devices.
  43. The size of the interleaved device is
  44. N times the size of the smallest sub-device.
  45. .IP "\fID\fP = {\fIDD\fP...}"
  46. .br
  47. This is a set of devices that
  48. constitute a `mirror' of the first sub-device, and form a single device.
  49. A write to the device is performed,
  50. at the same block address,
  51. on the sub-devices, in right-to-left order.
  52. A read from the device is performed on each sub-device,
  53. in left-to-right order, until a read succeeds without error,
  54. or the set is exhausted.
  55. One can think of this as a poor man's RAID 1.
  56. The size of the device is the size of the smallest sub-device.
  57. .IP "\fID\fP = \f(CWp\fP\fIDN1.N2\fP"
  58. .br
  59. This is a partition of a sub-device.
  60. The sub-device is partitioned into 100 equal pieces.
  61. If the size of the sub-device is not divisible by 100,
  62. then there will be some slop thrown away at the top.
  63. The pseudo-device starts at the N1-th piece and
  64. continues for N2 pieces. Thus
  65. .CW p\fID\fP67.33
  66. will be the
  67. last third of the device
  68. .I D .
  69. .IP "\fID\fP = \f(CWf\fP\fID\fP"
  70. .br
  71. This is a fake write-once-read-many device simulated by a
  72. second read-write device.
  73. This second device is partitioned
  74. into a set of block flags and a set of blocks.
  75. The flags are used to generate errors if a
  76. block is ever written twice or read without being written first.
  77. .IP "\fID\fP = \f(CWx\fP\fID\fP"
  78. .br
  79. This is a byte-swapped version of the file system on D.
  80. Since the file server currently writes integers in metadata to disk
  81. in native byte order, moving a file system to a machine of the other
  82. major byte order (e.g., MIPS to Pentium)
  83. requires the use of
  84. .CW x .
  85. It knows the sizes of the various integer fields in the file system metadata.
  86. Ideally, the file server would follow the Plan 9 religion and write a consistent
  87. byte order on disk, regardless of processor.
  88. In the mean time, it should be possible to automatically determine the need
  89. for byte-swapping by examining data in the super-block of each file system,
  90. though this has not been implemented yet.
  91. .IP "\fID\fP = \f(CWc\fP\fIDD\fP"
  92. .br
  93. This is the cache/WORM device made up of a cache (read-write)
  94. device and a WORM (write-once-read-many) device.
  95. More on this later.
  96. .IP "\fID\fP = \f(CWo\fP"
  97. .br
  98. This is the dump file system that is the
  99. two-level hierarchy of all dumps ever taken on a cache/WORM.
  100. The read-only root of the cache/WORM file system
  101. (on the dump taken Feb 18, 1995) can
  102. be referenced as
  103. .CW /1995/0218
  104. in this pseudo device.
  105. The second dump taken that day will be
  106. .CW /1995/02181 .
  107. .IP "\fID\fP = \f(CWw\fP\fIN1.N2.N3\fP"
  108. .br
  109. This is a SCSI disk on controller N1, target N2 and logical unit number N3.
  110. .IP "\fID\fP = \f(CWh\fP\fIN1.N2.0\fP"
  111. .br
  112. This is an (E)IDE or *ATA disk on controller N1, target N2
  113. (target 0 is the IDE master, 1 the slave device).
  114. These disks are currently run via programmed I/O, not DMA,
  115. so they tend to be slower to access than SCSI disks.
  116. .IP "\fID\fP = \f(CWr\fP\fIN1\fP"
  117. .br
  118. This is the same as
  119. .CW w ,
  120. but refers to a side of a WORM disc.
  121. See the
  122. .I j
  123. device.
  124. .IP "\fID\fP = \f(CWl\fP\fIN1\fP"
  125. .br
  126. This is the same as
  127. .CW r ,
  128. but one block from the SCSI disk is removed for labeling.
  129. .IP "\fID\fP = \f(CWj(\fP\fID\d\s-2\&1\s+2\u\fID\d\s-2\&2\s+2\u\f(CW*)\fID\d\s-2\&3\s+2\u\f1"
  130. .br
  131. .I D\d\s-2\&1\s+2\u
  132. is the juke box SCSI interface.
  133. The
  134. .I D\d\s-2\&2\s+2\u 's
  135. are the SCSI drives in the juke box
  136. and the
  137. .I D\d\s-2\&3\s+2\u 's
  138. are the demountable platters in the juke box.
  139. .I D\d\s-2\&1\s+2\u
  140. and
  141. .I D\d\s-2\&2\s+2\u
  142. must be
  143. .CW w .
  144. .I D\d\s-2\&3\s+2\u
  145. must be pseudo devices of
  146. .CW w ,
  147. .CW r ,
  148. or
  149. .CW l
  150. devices.
  151. .PP
  152. For
  153. .CW w ,
  154. .CW h ,
  155. .CW l ,
  156. and
  157. .CW r
  158. devices any of the configuration numbers
  159. can be replaced by an iterator of the form
  160. .CW <\fIN1-N2\fP> .
  161. N1 can be greater than N2, indicating a descending sequence.
  162. Thus
  163. .Ex
  164. [w0.<2-6>]
  165. .Ee
  166. is the interleaved SCSI disks on SCSI targets
  167. 2 through 6 of SCSI controller 0.
  168. The main file system on
  169. Emelie
  170. is defined by the configuration string
  171. .Ex
  172. c[w1.<0-5>.0]j(w6w5w4w3w2)(l<0-236>l<238-474>)
  173. .Ee
  174. This is a cache/WORM driver.
  175. The cache is three interleaved disks on SCSI controller 1
  176. targets 0, 1, 2, 3, 4, and 5.
  177. The WORM half of the cache/WORM
  178. is 474 jukebox disks.
  179. Another file server,
  180. .I choline ,
  181. has a main file system defined by
  182. .Ex
  183. c[w<1-3>]j(w1.<6-0>.0)(l<0-124>l<128-252>)
  184. .Ee
  185. The order of
  186. .CW w1.<6-0>.0
  187. matters here, since the optical jukebox's WORM drives's
  188. SCSI target ids,
  189. as delivered,
  190. run in descending order relative to the numbers of the drives
  191. in SCSI commands
  192. (e.g., the jukebox controller is SCSI target 6,
  193. drive #1 is SCSI target 5,
  194. and drive #6 is SCSI target 0).