22 KB

  1. .HTML "The Use of Name Spaces in Plan 9
  2. .TL
  3. The Use of Name Spaces in Plan 9
  4. .AU
  5. Rob Pike
  6. Dave Presotto
  7. Ken Thompson
  8. Howard Trickey
  9. Phil Winterbottom
  10. .AI
  11. .MH
  12. USA
  13. .AB
  14. .FS
  15. Appeared in
  16. .I
  17. Operating Systems Review,
  18. .R
  19. Vol. 27, #2, April 1993, pp. 72-76
  20. (reprinted from
  21. .I
  22. Proceedings of the 5th ACM SIGOPS European Workshop,
  23. .R
  24. Mont Saint-Michel, 1992, Paper nº 34).
  25. .FE
  26. Plan 9 is a distributed system built at the Computing Sciences Research
  27. Center of AT&T Bell Laboratories (now Lucent Technologies, Bell Labs) over the last few years.
  28. Its goal is to provide a production-quality system for software
  29. development and general computation using heterogeneous hardware
  30. and minimal software. A Plan 9 system comprises CPU and file
  31. servers in a central location connected together by fast networks.
  32. Slower networks fan out to workstation-class machines that serve as
  33. user terminals. Plan 9 argues that given a few carefully
  34. implemented abstractions
  35. it is possible to
  36. produce a small operating system that provides support for the largest systems
  37. on a variety of architectures and networks. The foundations of the system are
  38. built on two ideas: a per-process name space and a simple message-oriented
  39. file system protocol.
  40. .AE
  41. .PP
  42. The operating system for the CPU servers and terminals is
  43. structured as a traditional kernel: a single compiled image
  44. containing code for resource management, process control,
  45. user processes,
  46. virtual memory, and I/O. Because the file server is a separate
  47. machine, the file system is not compiled in, although the management
  48. of the name space, a per-process attribute, is.
  49. The entire kernel for the multiprocessor SGI Power Series machine
  50. is 25000 lines of C,
  51. the largest part of which is code for four networks including the
  52. Ethernet with the Internet protocol suite.
  53. Fewer than 1500 lines are machine-specific, and a
  54. functional kernel with minimal I/O can be put together from
  55. source files totaling 6000 lines. [Pike90]
  56. .PP
  57. The system is relatively small for several reasons.
  58. First, it is all new: it has not had time to accrete as many fixes
  59. and features as other systems.
  60. Also, other than the network protocol, it adheres to no
  61. external interface; in particular, it is not Unix-compatible.
  62. Economy stems from careful selection of services and interfaces.
  63. Finally, wherever possible the system is built around
  64. two simple ideas:
  65. every resource in the system, either local or remote,
  66. is represented by a hierarchical file system; and
  67. a user or process
  68. assembles a private view of the system by constructing a file
  69. .I
  70. name space
  71. .R
  72. that connects these resources. [Needham]
  73. .SH
  74. File Protocol
  75. .PP
  76. All resources in Plan 9 look like file systems.
  77. That does not mean that they are repositories for
  78. permanent files on disk, but that the interface to them
  79. is file-oriented: finding files (resources) in a hierarchical
  80. name tree, attaching to them by name, and accessing their contents
  81. by read and write calls.
  82. There are dozens of file system types in Plan 9, but only a few
  83. represent traditional files.
  84. At this level of abstraction, files in Plan 9 are similar
  85. to objects, except that files are already provided with naming,
  86. access, and protection methods that must be created afresh for
  87. objects. Object-oriented readers may approach the rest of this
  88. paper as a study in how to make objects look like files.
  89. .PP
  90. The interface to file systems is defined by a protocol, called 9P,
  91. analogous but not very similar to the NFS protocol.
  92. The protocol talks about files, not blocks; given a connection to the root
  93. directory of a file server,
  94. the 9P messages navigate the file hierarchy, open files for I/O,
  95. and read or write arbitrary bytes in the files.
  96. 9P contains 17 message types: three for
  97. initializing and
  98. authenticating a connection and fourteen for manipulating objects.
  99. The messages are generated by the kernel in response to user- or
  100. kernel-level I/O requests.
  101. Here is a quick tour of the major message types.
  102. The
  103. .CW auth
  104. and
  105. .CW attach
  106. messages authenticate a connection, established by means outside 9P,
  107. and validate its user.
  108. The result is an authenticated
  109. .I channel
  110. that points to the root of the
  111. server.
  112. The
  113. .CW clone
  114. message makes a new channel identical to an existing channel,
  115. which may be moved to a file on the server using a
  116. .CW walk
  117. message to descend each level in the hierarchy.
  118. The
  119. .CW stat
  120. and
  121. .CW wstat
  122. messages read and write the attributes of the file pointed to by a channel.
  123. The
  124. .CW open
  125. message prepares a channel for subsequent
  126. .CW read
  127. and
  128. .CW write
  129. messages to access the contents of the file, while
  130. .CW create
  131. and
  132. .CW remove
  133. perform, on the files, the actions implied by their names.
  134. The
  135. .CW clunk
  136. message discards a channel without affecting the file.
  137. None of the 9P messages consider caching; file caches are provided,
  138. when needed, either within the server (centralized caching)
  139. or by implementing the cache as a transparent file system between the
  140. client and the 9P connection to the server (client caching).
  141. .PP
  142. For efficiency, the connection to local
  143. kernel-resident file systems, misleadingly called
  144. .I devices,
  145. is by regular rather than remote procedure calls.
  146. The procedures map one-to-one with 9P message types.
  147. Locally each channel has an associated data structure
  148. that holds a type field used to index
  149. a table of procedure calls, one set per file system type,
  150. analogous to selecting the method set for an object.
  151. One kernel-resident file system, the
  152. .I
  153. mount device,
  154. .R
  155. translates the local 9P procedure calls into RPC messages to
  156. remote services over a separately provided transport protocol
  157. such as TCP or IL, a new reliable datagram protocol, or over a pipe to
  158. a user process.
  159. Write and read calls transmit the messages over the transport layer.
  160. The mount device is the sole bridge between the procedural
  161. interface seen by user programs and remote and user-level services.
  162. It does all associated marshaling, buffer
  163. management, and multiplexing and is
  164. the only integral RPC mechanism in Plan 9.
  165. The mount device is in effect a proxy object.
  166. There is no RPC stub compiler; instead the mount driver and
  167. all servers just share a library that packs and unpacks 9P messages.
  168. .SH
  169. Examples
  170. .PP
  171. One file system type serves
  172. permanent files from the main file server,
  173. a stand-alone multiprocessor system with a
  174. 350-gigabyte
  175. optical WORM jukebox that holds the data, fronted by a two-level
  176. block cache comprising 7 gigabytes of
  177. magnetic disk and 128 megabytes of RAM.
  178. Clients connect to the file server using any of a variety of
  179. networks and protocols and access files using 9P.
  180. The file server runs a distinct operating system and has no
  181. support for user processes; other than a restricted set of commands
  182. available on the console, all it does is answer 9P messages from clients.
  183. .PP
  184. Once a day, at 5:00 AM,
  185. the file server sweeps through the cache blocks and marks dirty blocks
  186. copy-on-write.
  187. It creates a copy of the root directory
  188. and labels it with the current date, for example
  189. .CW 1995/0314 .
  190. It then starts a background process to copy the dirty blocks to the WORM.
  191. The result is that the server retains an image of the file system as it was
  192. early each morning.
  193. The set of old root directories is accessible using 9P, so a client
  194. may examine backup files using ordinary commands.
  195. Several advantages stem from having the backup service implemented
  196. as a plain file system.
  197. Most obviously, ordinary commands can access them.
  198. For example, to see when a bug was fixed
  199. .P1
  200. grep 'mouse bug fix' 1995/*/sys/src/cmd/8½/file.c
  201. .P2
  202. The owner, access times, permissions, and other properties of the
  203. files are also backed up.
  204. Because it is a file system, the backup
  205. still has protections;
  206. it is not possible to subvert security by looking at the backup.
  207. .PP
  208. The file server is only one type of file system.
  209. A number of unusual services are provided within the kernel as
  210. local file systems.
  211. These services are not limited to I/O devices such
  212. as disks. They include network devices and their associated protocols,
  213. the bitmap display and mouse,
  214. a representation of processes similar to
  215. .CW /proc
  216. [Killian], the name/value pairs that form the `environment'
  217. passed to a new process, profiling services,
  218. and other resources.
  219. Each of these is represented as a file system \(em
  220. directories containing sets of files \(em
  221. but the constituent files do not represent permanent storage on disk.
  222. Instead, they are closer in properties to UNIX device files.
  223. .PP
  224. For example, the
  225. .I console
  226. device contains the file
  227. .CW /dev/cons ,
  228. similar to the UNIX file
  229. .CW /dev/console :
  230. when written,
  231. .CW /dev/cons
  232. appends to the console typescript; when read,
  233. it returns characters typed on the keyboard.
  234. Other files in the console device include
  235. .CW /dev/time ,
  236. the number of seconds since the epoch,
  237. .CW /dev/cputime ,
  238. the computation time used by the process reading the device,
  239. .CW /dev/pid ,
  240. the process id of the process reading the device, and
  241. .CW /dev/user ,
  242. the login name of the user accessing the device.
  243. All these files contain text, not binary numbers,
  244. so their use is free of byte-order problems.
  245. Their contents are synthesized on demand when read; when written,
  246. they cause modifications to kernel data structures.
  247. .PP
  248. The
  249. .I process
  250. device contains one directory per live local process, named by its numeric
  251. process id:
  252. .CW /proc/1 ,
  253. .CW /proc/2 ,
  254. etc.
  255. Each directory contains a set of files that access the process.
  256. For example, in each directory the file
  257. .CW mem
  258. is an image of the virtual memory of the process that may be read or
  259. written for debugging.
  260. The
  261. .CW text
  262. file is a sort of link to the file from which the process was executed;
  263. it may be opened to read the symbol tables for the process.
  264. The
  265. .CW ctl
  266. file may be written textual messages such as
  267. .CW stop
  268. or
  269. .CW kill
  270. to control the execution of the process.
  271. The
  272. .CW status
  273. file contains a fixed-format line of text containing information about
  274. the process: its name, owner, state, and so on.
  275. Text strings written to the
  276. .CW note
  277. file are delivered to the process as
  278. .I notes,
  279. analogous to UNIX signals.
  280. By providing these services as textual I/O on files rather
  281. than as system calls (such as
  282. .CW kill )
  283. or special-purpose operations (such as
  284. .CW ptrace ),
  285. the Plan 9 process device simplifies the implementation of
  286. debuggers and related programs.
  287. For example, the command
  288. .P1
  289. cat /proc/*/status
  290. .P2
  291. is a crude form of the
  292. .CW ps
  293. command; the actual
  294. .CW ps
  295. merely reformats the data so obtained.
  296. .PP
  297. The
  298. .I bitmap
  299. device contains three files,
  300. .CW /dev/mouse ,
  301. .CW /dev/screen ,
  302. and
  303. .CW /dev/bitblt ,
  304. that provide an interface to the local bitmap display (if any) and pointing device.
  305. The
  306. .CW mouse
  307. file returns a fixed-format record containing
  308. 1 byte of button state and 4 bytes each of
  309. .I x
  310. and
  311. .I y
  312. position of the mouse.
  313. If the mouse has not moved since the file was last read, a subsequent read will
  314. block.
  315. The
  316. .CW screen
  317. file contains a memory image of the contents of the display;
  318. the
  319. .CW bitblt
  320. file provides a procedural interface.
  321. Calls to the graphics library are translated into messages that are written
  322. to the
  323. .CW bitblt
  324. file to perform bitmap graphics operations. (This is essentially a nested
  325. RPC protocol.)
  326. .PP
  327. The various services being used by a process are gathered together into the
  328. process's
  329. .I
  330. name space,
  331. .R
  332. a single rooted hierarchy of file names.
  333. When a process forks, the child process shares the name space with the parent.
  334. Several system calls manipulate name spaces.
  335. Given a file descriptor
  336. .CW fd
  337. that holds an open communications channel to a service,
  338. the call
  339. .P1
  340. mount(int fd, char *old, int flags)
  341. .P2
  342. authenticates the user and attaches the file tree of the service to
  343. the directory named by
  344. .CW old .
  345. The
  346. .CW flags
  347. specify how the tree is to be attached to
  348. .CW old :
  349. replacing the current contents or appearing before or after the
  350. current contents of the directory.
  351. A directory with several services mounted is called a
  352. .I union
  353. directory and is searched in the specified order.
  354. The call
  355. .P1
  356. bind(char *new, char *old, int flags)
  357. .P2
  358. takes the portion of the existing name space visible at
  359. .CW new ,
  360. either a file or a directory, and makes it also visible at
  361. .CW old .
  362. For example,
  363. .P1
  364. bind("1995/0301/sys/include", "/sys/include", REPLACE)
  365. .P2
  366. causes the directory of include files to be overlaid with its
  367. contents from the dump on March first.
  368. .PP
  369. A process is created by the
  370. .CW rfork
  371. system call, which takes as argument a bit vector defining which
  372. attributes of the process are to be shared between parent
  373. and child instead of copied.
  374. One of the attributes is the name space: when shared, changes
  375. made by either process are visible in the other; when copied,
  376. changes are independent.
  377. .PP
  378. Although there is no global name space,
  379. for a process to function sensibly the local name spaces must adhere
  380. to global conventions.
  381. Nonetheless, the use of local name spaces is critical to the system.
  382. Both these ideas are illustrated by the use of the name space to
  383. handle heterogeneity.
  384. The binaries for a given architecture are contained in a directory
  385. named by the architecture, for example
  386. .CW /mips/bin ;
  387. in use, that directory is bound to the conventional location
  388. .CW /bin .
  389. Programs such as shell scripts need not know the CPU type they are
  390. executing on to find binaries to run.
  391. A directory of private binaries
  392. is usually unioned with
  393. .CW /bin .
  394. (Compare this to the
  395. .I
  396. ad hoc
  397. .R
  398. and special-purpose idea of the
  399. .CW PATH
  400. variable, which is not used in the Plan 9 shell.)
  401. Local bindings are also helpful for debugging, for example by binding
  402. an old library to the standard place and linking a program to see
  403. if recent changes to the library are responsible for a bug in the program.
  404. .PP
  405. The window system,
  406. .CW 8½
  407. [Pike91], is a server for files such as
  408. .CW /dev/cons
  409. and
  410. .CW /dev/bitblt .
  411. Each client sees a distinct copy of these files in its local
  412. name space: there are many instances of
  413. .CW /dev/cons ,
  414. each served by
  415. .CW 8½
  416. to the local name space of a window.
  417. Again,
  418. .CW 8½
  419. implements services using
  420. local name spaces plus the use
  421. of I/O to conventionally named files.
  422. Each client just connects its standard input, output, and error files
  423. to
  424. .CW /dev/cons ,
  425. with analogous operations to access bitmap graphics.
  426. Compare this to the implementation of
  427. .CW /dev/tty
  428. on UNIX, which is done by special code in the kernel
  429. that overloads the file, when opened,
  430. with the standard input or output of the process.
  431. Special arrangement must be made by a UNIX window system for
  432. .CW /dev/tty
  433. to behave as expected;
  434. .CW 8½
  435. instead uses the provision of the corresponding file as its
  436. central idea, which to succeed depends critically on local name spaces.
  437. .PP
  438. The environment
  439. .CW 8½
  440. provides its clients is exactly the environment under which it is implemented:
  441. a conventional set of files in
  442. .CW /dev .
  443. This permits the window system to be run recursively in one of its own
  444. windows, which is handy for debugging.
  445. It also means that if the files are exported to another machine,
  446. as described below, the window system or client applications may be
  447. run transparently on remote machines, even ones without graphics hardware.
  448. This mechanism is used for Plan 9's implementation of the X window
  449. system: X is run as a client of
  450. .CW 8½ ,
  451. often on a remote machine with lots of memory.
  452. In this configuration, using Ethernet to connect
  453. MIPS machines, we measure only a 10% degradation in graphics
  454. performance relative to running X on
  455. a bare Plan 9 machine.
  456. .PP
  457. An unusual application of these ideas is a statistics-gathering
  458. file system implemented by a command called
  459. .CW iostats .
  460. The command encapsulates a process in a local name space, monitoring 9P
  461. requests from the process to the outside world \(em the name space in which
  462. .CW iostats
  463. is itself running. When the command completes,
  464. .CW iostats
  465. reports usage and performance figures for file activity.
  466. For example
  467. .P1
  468. iostats 8½
  469. .P2
  470. can be used to discover how much I/O the window system
  471. does to the bitmap device, font files, and so on.
  472. .PP
  473. The
  474. .CW import
  475. command connects a piece of name space from a remote system
  476. to the local name space.
  477. Its implementation is to dial the remote machine and start
  478. a process there that serves the remote name space using 9P.
  479. It then calls
  480. .CW mount
  481. to attach the connection to the name space and finally dies;
  482. the remote process continues to serve the files.
  483. One use is to access devices not available
  484. locally. For example, to write a floppy one may say
  485. .P1
  486. import lab.pc /a: /n/dos
  487. cp foo /n/dos/bar
  488. .P2
  489. The call to
  490. .CW import
  491. connects the file tree from
  492. .CW /a:
  493. on the machine
  494. .CW lab.pc
  495. (which must support 9P) to the local directory
  496. .CW /n/dos .
  497. Then the file
  498. .CW foo
  499. can be written to the floppy just by copying it across.
  500. .PP
  501. Another application is remote debugging:
  502. .P1
  503. import helix /proc
  504. .P2
  505. makes the process file system on machine
  506. .CW helix
  507. available locally; commands such as
  508. .CW ps
  509. then see
  510. .CW helix 's
  511. processes instead of the local ones.
  512. The debugger may then look at a remote process:
  513. .P1
  514. db /proc/27/text /proc/27/mem
  515. .P2
  516. allows breakpoint debugging of the remote process.
  517. Since
  518. .CW db
  519. infers the CPU type of the process from the executable header on
  520. the text file, it supports
  521. cross-architecture debugging, too.
  522. Care is taken within
  523. .CW db
  524. to handle issues of byte order and floating point; it is possible to
  525. breakpoint debug a big-endian MIPS process from a little-endian i386.
  526. .PP
  527. Network interfaces are also implemented as file systems [Presotto].
  528. For example,
  529. .CW /net/tcp
  530. is a directory somewhat like
  531. .CW /proc :
  532. it contains a set of numbered directories, one per connection,
  533. each of which contains files to control and communicate on the connection.
  534. A process allocates a new connection by accessing
  535. .CW /net/tcp/clone ,
  536. which evaluates to the directory of an unused connection.
  537. To make a call, the process writes a textual message such as
  538. .CW 'connect
  539. .CW!512'
  540. to the
  541. .CW ctl
  542. file and then reads and writes the
  543. .CW data
  544. file.
  545. An
  546. .CW rlogin
  547. service can be implemented in a few of lines of shell code.
  548. .PP
  549. This structure makes network gatewaying easy to provide.
  550. We have machines with Datakit interfaces but no Internet interface.
  551. On such a machine one may type
  552. .P1
  553. import helix /net
  554. telnet tcp!
  555. .P2
  556. The
  557. .CW import
  558. uses Datakit to pull in the TCP interface from
  559. .CW helix ,
  560. which can then be used directly; the
  561. .CW tcp!
  562. notation is necessary because we routinely use multiple networks
  563. and protocols on Plan 9\(emit identifies the network in which
  564. .CW
  565. is a valid name.
  566. .PP
  567. In practice we do not use
  568. .CW rlogin
  569. or
  570. .CW telnet
  571. between Plan 9 machines. Instead a command called
  572. .CW cpu
  573. in effect replaces the CPU in a window with that
  574. on another machine, typically a fast multiprocessor CPU server.
  575. The implementation is to recreate the
  576. name space on the remote machine, using the equivalent of
  577. .CW import
  578. to connect pieces of the terminal's name space to that of
  579. the process (shell) on the CPU server, making the terminal
  580. a file server for the CPU.
  581. CPU-local devices such as fast file system connections
  582. are still local; only terminal-resident devices are
  583. imported.
  584. The result is unlike UNIX
  585. .CW rlogin ,
  586. which moves into a distinct name space on the remote machine,
  587. or file sharing with
  588. .CW NFS ,
  589. which keeps the name space the same but forces processes to execute
  590. locally.
  591. Bindings in
  592. .CW /bin
  593. may change because of a change in CPU architecture, and
  594. the networks involved may be different because of differing hardware,
  595. but the effect feels like simply speeding up the processor in the
  596. current name space.
  597. .SH
  598. Position
  599. .PP
  600. These examples illustrate how the ideas of representing resources
  601. as file systems and per-process name spaces can be used to solve
  602. problems often left to more exotic mechanisms.
  603. Nonetheless there are some operations in Plan 9 that are not
  604. mapped into file I/O.
  605. An example is process creation.
  606. We could imagine a message to a control file in
  607. .CW /proc
  608. that creates a process, but the details of
  609. constructing the environment of the new process \(em its open files,
  610. name space, memory image, etc. \(em are too intricate to
  611. be described easily in a simple I/O operation.
  612. Therefore new processes on Plan 9 are created by fairly conventional
  613. .CW rfork
  614. and
  615. .CW exec
  616. system calls;
  617. .CW /proc
  618. is used only to represent and control existing processes.
  619. .PP
  620. Plan 9 does not attempt to map network name spaces into the file
  621. system name space, for several reasons.
  622. The different addressing rules for various networks and protocols
  623. cannot be mapped uniformly into a hierarchical file name space.
  624. Even if they could be,
  625. the various mechanisms to authenticate,
  626. select a service,
  627. and control the connection would not map consistently into
  628. operations on a file.
  629. .PP
  630. Shared memory is another resource not adequately represented by a
  631. file name space.
  632. Plan 9 takes care to provide mechanisms
  633. to allow groups of local processes to share and map memory.
  634. Memory is controlled
  635. by system calls rather than special files, however,
  636. since a representation in the file system would imply that memory could
  637. be imported from remote machines.
  638. .PP
  639. Despite these limitations, file systems and name spaces offer an effective
  640. model around which to build a distributed system.
  641. Used well, they can provide a uniform, familiar, transparent
  642. interface to a diverse set of distributed resources.
  643. They carry well-understood properties of access, protection,
  644. and naming.
  645. The integration of devices into the hierarchical file system
  646. was the best idea in UNIX.
  647. Plan 9 pushes the concepts much further and shows that
  648. file systems, when used inventively, have plenty of scope
  649. for productive research.
  650. .SH
  651. References
  652. .LP
  653. [Killian] T. Killian, ``Processes as Files'', USENIX Summer Conf. Proc., Salt Lake City, 1984
  654. .br
  655. [Needham] R. Needham, ``Names'', in
  656. .I
  657. Distributed systems,
  658. .R
  659. S. Mullender, ed.,
  660. Addison Wesley, 1989
  661. .br
  662. [Pike90] R. Pike, D. Presotto, K. Thompson, H. Trickey,
  663. ``Plan 9 from Bell Labs'',
  664. UKUUG Proc. of the Summer 1990 Conf.,
  665. London, England,
  666. 1990
  667. .br
  668. [Presotto] D. Presotto, ``Multiprocessor Streams for Plan 9'',
  669. UKUUG Proc. of the Summer 1990 Conf.,
  670. London, England,
  671. 1990
  672. .br
  673. [Pike91] Pike, R., ``8.5, The Plan 9 Window System'', USENIX Summer
  674. Conf. Proc., Nashville, 1991