123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673 |
- .TL
- The Use of Name Spaces in Plan 9
- .AU
- Rob Pike
- Dave Presotto
- Ken Thompson
- Howard Trickey
- Phil Winterbottom
- .AI
- .MH
- USA
- .AB
- .FS
- Appeared in
- .I
- Operating Systems Review,
- .R
- Vol. 27, #2, April 1993, pp. 72-76
- (reprinted from
- .I
- Proceedings of the 5th ACM SIGOPS European Workshop,
- .R
- Mont Saint-Michel, 1992, Paper nº 34).
- .FE
- Plan 9 is a distributed system built at the Computing Sciences Research
- Center of AT&T Bell Laboratories (now Lucent Technologies, Bell Labs) over the last few years.
- Its goal is to provide a production-quality system for software
- development and general computation using heterogeneous hardware
- and minimal software. A Plan 9 system comprises CPU and file
- servers in a central location connected together by fast networks.
- Slower networks fan out to workstation-class machines that serve as
- user terminals. Plan 9 argues that given a few carefully
- implemented abstractions
- it is possible to
- produce a small operating system that provides support for the largest systems
- on a variety of architectures and networks. The foundations of the system are
- built on two ideas: a per-process name space and a simple message-oriented
- file system protocol.
- .AE
- .PP
- The operating system for the CPU servers and terminals is
- structured as a traditional kernel: a single compiled image
- containing code for resource management, process control,
- user processes,
- virtual memory, and I/O. Because the file server is a separate
- machine, the file system is not compiled in, although the management
- of the name space, a per-process attribute, is.
- The entire kernel for the multiprocessor SGI Power Series machine
- is 25000 lines of C,
- the largest part of which is code for four networks including the
- Ethernet with the Internet protocol suite.
- Fewer than 1500 lines are machine-specific, and a
- functional kernel with minimal I/O can be put together from
- source files totaling 6000 lines. [Pike90]
- .PP
- The system is relatively small for several reasons.
- First, it is all new: it has not had time to accrete as many fixes
- and features as other systems.
- Also, other than the network protocol, it adheres to no
- external interface; in particular, it is not Unix-compatible.
- Economy stems from careful selection of services and interfaces.
- Finally, wherever possible the system is built around
- two simple ideas:
- every resource in the system, either local or remote,
- is represented by a hierarchical file system; and
- a user or process
- assembles a private view of the system by constructing a file
- .I
- name space
- .R
- that connects these resources. [Needham]
- .SH
- File Protocol
- .PP
- All resources in Plan 9 look like file systems.
- That does not mean that they are repositories for
- permanent files on disk, but that the interface to them
- is file-oriented: finding files (resources) in a hierarchical
- name tree, attaching to them by name, and accessing their contents
- by read and write calls.
- There are dozens of file system types in Plan 9, but only a few
- represent traditional files.
- At this level of abstraction, files in Plan 9 are similar
- to objects, except that files are already provided with naming,
- access, and protection methods that must be created afresh for
- objects. Object-oriented readers may approach the rest of this
- paper as a study in how to make objects look like files.
- .PP
- The interface to file systems is defined by a protocol, called 9P,
- analogous but not very similar to the NFS protocol.
- The protocol talks about files, not blocks; given a connection to the root
- directory of a file server,
- the 9P messages navigate the file hierarchy, open files for I/O,
- and read or write arbitrary bytes in the files.
- 9P contains 17 message types: three for
- initializing and
- authenticating a connection and fourteen for manipulating objects.
- The messages are generated by the kernel in response to user- or
- kernel-level I/O requests.
- Here is a quick tour of the major message types.
- The
- .CW auth
- and
- .CW attach
- messages authenticate a connection, established by means outside 9P,
- and validate its user.
- The result is an authenticated
- .I channel
- that points to the root of the
- server.
- The
- .CW clone
- message makes a new channel identical to an existing channel,
- which may be moved to a file on the server using a
- .CW walk
- message to descend each level in the hierarchy.
- The
- .CW stat
- and
- .CW wstat
- messages read and write the attributes of the file pointed to by a channel.
- The
- .CW open
- message prepares a channel for subsequent
- .CW read
- and
- .CW write
- messages to access the contents of the file, while
- .CW create
- and
- .CW remove
- perform, on the files, the actions implied by their names.
- The
- .CW clunk
- message discards a channel without affecting the file.
- None of the 9P messages consider caching; file caches are provided,
- when needed, either within the server (centralized caching)
- or by implementing the cache as a transparent file system between the
- client and the 9P connection to the server (client caching).
- .PP
- For efficiency, the connection to local
- kernel-resident file systems, misleadingly called
- .I devices,
- is by regular rather than remote procedure calls.
- The procedures map one-to-one with 9P message types.
- Locally each channel has an associated data structure
- that holds a type field used to index
- a table of procedure calls, one set per file system type,
- analogous to selecting the method set for an object.
- One kernel-resident file system, the
- .I
- mount device,
- .R
- translates the local 9P procedure calls into RPC messages to
- remote services over a separately provided transport protocol
- such as TCP or IL, a new reliable datagram protocol, or over a pipe to
- a user process.
- Write and read calls transmit the messages over the transport layer.
- The mount device is the sole bridge between the procedural
- interface seen by user programs and remote and user-level services.
- It does all associated marshaling, buffer
- management, and multiplexing and is
- the only integral RPC mechanism in Plan 9.
- The mount device is in effect a proxy object.
- There is no RPC stub compiler; instead the mount driver and
- all servers just share a library that packs and unpacks 9P messages.
- .SH
- Examples
- .PP
- One file system type serves
- permanent files from the main file server,
- a stand-alone multiprocessor system with a
- 350-gigabyte
- optical WORM jukebox that holds the data, fronted by a two-level
- block cache comprising 7 gigabytes of
- magnetic disk and 128 megabytes of RAM.
- Clients connect to the file server using any of a variety of
- networks and protocols and access files using 9P.
- The file server runs a distinct operating system and has no
- support for user processes; other than a restricted set of commands
- available on the console, all it does is answer 9P messages from clients.
- .PP
- Once a day, at 5:00 AM,
- the file server sweeps through the cache blocks and marks dirty blocks
- copy-on-write.
- It creates a copy of the root directory
- and labels it with the current date, for example
- .CW 1995/0314 .
- It then starts a background process to copy the dirty blocks to the WORM.
- The result is that the server retains an image of the file system as it was
- early each morning.
- The set of old root directories is accessible using 9P, so a client
- may examine backup files using ordinary commands.
- Several advantages stem from having the backup service implemented
- as a plain file system.
- Most obviously, ordinary commands can access them.
- For example, to see when a bug was fixed
- .P1
- grep 'mouse bug fix' 1995/*/sys/src/cmd/8½/file.c
- .P2
- The owner, access times, permissions, and other properties of the
- files are also backed up.
- Because it is a file system, the backup
- still has protections;
- it is not possible to subvert security by looking at the backup.
- .PP
- The file server is only one type of file system.
- A number of unusual services are provided within the kernel as
- local file systems.
- These services are not limited to I/O devices such
- as disks. They include network devices and their associated protocols,
- the bitmap display and mouse,
- a representation of processes similar to
- .CW /proc
- [Killian], the name/value pairs that form the `environment'
- passed to a new process, profiling services,
- and other resources.
- Each of these is represented as a file system \(em
- directories containing sets of files \(em
- but the constituent files do not represent permanent storage on disk.
- Instead, they are closer in properties to UNIX device files.
- .PP
- For example, the
- .I console
- device contains the file
- .CW /dev/cons ,
- similar to the UNIX file
- .CW /dev/console :
- when written,
- .CW /dev/cons
- appends to the console typescript; when read,
- it returns characters typed on the keyboard.
- Other files in the console device include
- .CW /dev/time ,
- the number of seconds since the epoch,
- .CW /dev/cputime ,
- the computation time used by the process reading the device,
- .CW /dev/pid ,
- the process id of the process reading the device, and
- .CW /dev/user ,
- the login name of the user accessing the device.
- All these files contain text, not binary numbers,
- so their use is free of byte-order problems.
- Their contents are synthesized on demand when read; when written,
- they cause modifications to kernel data structures.
- .PP
- The
- .I process
- device contains one directory per live local process, named by its numeric
- process id:
- .CW /proc/1 ,
- .CW /proc/2 ,
- etc.
- Each directory contains a set of files that access the process.
- For example, in each directory the file
- .CW mem
- is an image of the virtual memory of the process that may be read or
- written for debugging.
- The
- .CW text
- file is a sort of link to the file from which the process was executed;
- it may be opened to read the symbol tables for the process.
- The
- .CW ctl
- file may be written textual messages such as
- .CW stop
- or
- .CW kill
- to control the execution of the process.
- The
- .CW status
- file contains a fixed-format line of text containing information about
- the process: its name, owner, state, and so on.
- Text strings written to the
- .CW note
- file are delivered to the process as
- .I notes,
- analogous to UNIX signals.
- By providing these services as textual I/O on files rather
- than as system calls (such as
- .CW kill )
- or special-purpose operations (such as
- .CW ptrace ),
- the Plan 9 process device simplifies the implementation of
- debuggers and related programs.
- For example, the command
- .P1
- cat /proc/*/status
- .P2
- is a crude form of the
- .CW ps
- command; the actual
- .CW ps
- merely reformats the data so obtained.
- .PP
- The
- .I bitmap
- device contains three files,
- .CW /dev/mouse ,
- .CW /dev/screen ,
- and
- .CW /dev/bitblt ,
- that provide an interface to the local bitmap display (if any) and pointing device.
- The
- .CW mouse
- file returns a fixed-format record containing
- 1 byte of button state and 4 bytes each of
- .I x
- and
- .I y
- position of the mouse.
- If the mouse has not moved since the file was last read, a subsequent read will
- block.
- The
- .CW screen
- file contains a memory image of the contents of the display;
- the
- .CW bitblt
- file provides a procedural interface.
- Calls to the graphics library are translated into messages that are written
- to the
- .CW bitblt
- file to perform bitmap graphics operations. (This is essentially a nested
- RPC protocol.)
- .PP
- The various services being used by a process are gathered together into the
- process's
- .I
- name space,
- .R
- a single rooted hierarchy of file names.
- When a process forks, the child process shares the name space with the parent.
- Several system calls manipulate name spaces.
- Given a file descriptor
- .CW fd
- that holds an open communications channel to a service,
- the call
- .P1
- mount(int fd, char *old, int flags)
- .P2
- authenticates the user and attaches the file tree of the service to
- the directory named by
- .CW old .
- The
- .CW flags
- specify how the tree is to be attached to
- .CW old :
- replacing the current contents or appearing before or after the
- current contents of the directory.
- A directory with several services mounted is called a
- .I union
- directory and is searched in the specified order.
- The call
- .P1
- bind(char *new, char *old, int flags)
- .P2
- takes the portion of the existing name space visible at
- .CW new ,
- either a file or a directory, and makes it also visible at
- .CW old .
- For example,
- .P1
- bind("1995/0301/sys/include", "/sys/include", REPLACE)
- .P2
- causes the directory of include files to be overlaid with its
- contents from the dump on March first.
- .PP
- A process is created by the
- .CW rfork
- system call, which takes as argument a bit vector defining which
- attributes of the process are to be shared between parent
- and child instead of copied.
- One of the attributes is the name space: when shared, changes
- made by either process are visible in the other; when copied,
- changes are independent.
- .PP
- Although there is no global name space,
- for a process to function sensibly the local name spaces must adhere
- to global conventions.
- Nonetheless, the use of local name spaces is critical to the system.
- Both these ideas are illustrated by the use of the name space to
- handle heterogeneity.
- The binaries for a given architecture are contained in a directory
- named by the architecture, for example
- .CW /mips/bin ;
- in use, that directory is bound to the conventional location
- .CW /bin .
- Programs such as shell scripts need not know the CPU type they are
- executing on to find binaries to run.
- A directory of private binaries
- is usually unioned with
- .CW /bin .
- (Compare this to the
- .I
- ad hoc
- .R
- and special-purpose idea of the
- .CW PATH
- variable, which is not used in the Plan 9 shell.)
- Local bindings are also helpful for debugging, for example by binding
- an old library to the standard place and linking a program to see
- if recent changes to the library are responsible for a bug in the program.
- .PP
- The window system,
- .CW 8½
- [Pike91], is a server for files such as
- .CW /dev/cons
- and
- .CW /dev/bitblt .
- Each client sees a distinct copy of these files in its local
- name space: there are many instances of
- .CW /dev/cons ,
- each served by
- .CW 8½
- to the local name space of a window.
- Again,
- .CW 8½
- implements services using
- local name spaces plus the use
- of I/O to conventionally named files.
- Each client just connects its standard input, output, and error files
- to
- .CW /dev/cons ,
- with analogous operations to access bitmap graphics.
- Compare this to the implementation of
- .CW /dev/tty
- on UNIX, which is done by special code in the kernel
- that overloads the file, when opened,
- with the standard input or output of the process.
- Special arrangement must be made by a UNIX window system for
- .CW /dev/tty
- to behave as expected;
- .CW 8½
- instead uses the provision of the corresponding file as its
- central idea, which to succeed depends critically on local name spaces.
- .PP
- The environment
- .CW 8½
- provides its clients is exactly the environment under which it is implemented:
- a conventional set of files in
- .CW /dev .
- This permits the window system to be run recursively in one of its own
- windows, which is handy for debugging.
- It also means that if the files are exported to another machine,
- as described below, the window system or client applications may be
- run transparently on remote machines, even ones without graphics hardware.
- This mechanism is used for Plan 9's implementation of the X window
- system: X is run as a client of
- .CW 8½ ,
- often on a remote machine with lots of memory.
- In this configuration, using Ethernet to connect
- MIPS machines, we measure only a 10% degradation in graphics
- performance relative to running X on
- a bare Plan 9 machine.
- .PP
- An unusual application of these ideas is a statistics-gathering
- file system implemented by a command called
- .CW iostats .
- The command encapsulates a process in a local name space, monitoring 9P
- requests from the process to the outside world \(em the name space in which
- .CW iostats
- is itself running. When the command completes,
- .CW iostats
- reports usage and performance figures for file activity.
- For example
- .P1
- iostats 8½
- .P2
- can be used to discover how much I/O the window system
- does to the bitmap device, font files, and so on.
- .PP
- The
- .CW import
- command connects a piece of name space from a remote system
- to the local name space.
- Its implementation is to dial the remote machine and start
- a process there that serves the remote name space using 9P.
- It then calls
- .CW mount
- to attach the connection to the name space and finally dies;
- the remote process continues to serve the files.
- One use is to access devices not available
- locally. For example, to write a floppy one may say
- .P1
- import lab.pc /a: /n/dos
- cp foo /n/dos/bar
- .P2
- The call to
- .CW import
- connects the file tree from
- .CW /a:
- on the machine
- .CW lab.pc
- (which must support 9P) to the local directory
- .CW /n/dos .
- Then the file
- .CW foo
- can be written to the floppy just by copying it across.
- .PP
- Another application is remote debugging:
- .P1
- import helix /proc
- .P2
- makes the process file system on machine
- .CW helix
- available locally; commands such as
- .CW ps
- then see
- .CW helix 's
- processes instead of the local ones.
- The debugger may then look at a remote process:
- .P1
- db /proc/27/text /proc/27/mem
- .P2
- allows breakpoint debugging of the remote process.
- Since
- .CW db
- infers the CPU type of the process from the executable header on
- the text file, it supports
- cross-architecture debugging, too.
- Care is taken within
- .CW db
- to handle issues of byte order and floating point; it is possible to
- breakpoint debug a big-endian MIPS process from a little-endian i386.
- .PP
- Network interfaces are also implemented as file systems [Presotto].
- For example,
- .CW /net/tcp
- is a directory somewhat like
- .CW /proc :
- it contains a set of numbered directories, one per connection,
- each of which contains files to control and communicate on the connection.
- A process allocates a new connection by accessing
- .CW /net/tcp/clone ,
- which evaluates to the directory of an unused connection.
- To make a call, the process writes a textual message such as
- .CW 'connect
- .CW 135.104.53.2!512'
- to the
- .CW ctl
- file and then reads and writes the
- .CW data
- file.
- An
- .CW rlogin
- service can be implemented in a few of lines of shell code.
- .PP
- This structure makes network gatewaying easy to provide.
- We have machines with Datakit interfaces but no Internet interface.
- On such a machine one may type
- .P1
- import helix /net
- telnet tcp!ai.mit.edu
- .P2
- The
- .CW import
- uses Datakit to pull in the TCP interface from
- .CW helix ,
- which can then be used directly; the
- .CW tcp!
- notation is necessary because we routinely use multiple networks
- and protocols on Plan 9\(emit identifies the network in which
- .CW ai.mit.edu
- is a valid name.
- .PP
- In practice we do not use
- .CW rlogin
- or
- .CW telnet
- between Plan 9 machines. Instead a command called
- .CW cpu
- in effect replaces the CPU in a window with that
- on another machine, typically a fast multiprocessor CPU server.
- The implementation is to recreate the
- name space on the remote machine, using the equivalent of
- .CW import
- to connect pieces of the terminal's name space to that of
- the process (shell) on the CPU server, making the terminal
- a file server for the CPU.
- CPU-local devices such as fast file system connections
- are still local; only terminal-resident devices are
- imported.
- The result is unlike UNIX
- .CW rlogin ,
- which moves into a distinct name space on the remote machine,
- or file sharing with
- .CW NFS ,
- which keeps the name space the same but forces processes to execute
- locally.
- Bindings in
- .CW /bin
- may change because of a change in CPU architecture, and
- the networks involved may be different because of differing hardware,
- but the effect feels like simply speeding up the processor in the
- current name space.
- .SH
- Position
- .PP
- These examples illustrate how the ideas of representing resources
- as file systems and per-process name spaces can be used to solve
- problems often left to more exotic mechanisms.
- Nonetheless there are some operations in Plan 9 that are not
- mapped into file I/O.
- An example is process creation.
- We could imagine a message to a control file in
- .CW /proc
- that creates a process, but the details of
- constructing the environment of the new process \(em its open files,
- name space, memory image, etc. \(em are too intricate to
- be described easily in a simple I/O operation.
- Therefore new processes on Plan 9 are created by fairly conventional
- .CW rfork
- and
- .CW exec
- system calls;
- .CW /proc
- is used only to represent and control existing processes.
- .PP
- Plan 9 does not attempt to map network name spaces into the file
- system name space, for several reasons.
- The different addressing rules for various networks and protocols
- cannot be mapped uniformly into a hierarchical file name space.
- Even if they could be,
- the various mechanisms to authenticate,
- select a service,
- and control the connection would not map consistently into
- operations on a file.
- .PP
- Shared memory is another resource not adequately represented by a
- file name space.
- Plan 9 takes care to provide mechanisms
- to allow groups of local processes to share and map memory.
- Memory is controlled
- by system calls rather than special files, however,
- since a representation in the file system would imply that memory could
- be imported from remote machines.
- .PP
- Despite these limitations, file systems and name spaces offer an effective
- model around which to build a distributed system.
- Used well, they can provide a uniform, familiar, transparent
- interface to a diverse set of distributed resources.
- They carry well-understood properties of access, protection,
- and naming.
- The integration of devices into the hierarchical file system
- was the best idea in UNIX.
- Plan 9 pushes the concepts much further and shows that
- file systems, when used inventively, have plenty of scope
- for productive research.
- .SH
- References
- .LP
- [Killian] T. Killian, ``Processes as Files'', USENIX Summer Conf. Proc., Salt Lake City, 1984
- .br
- [Needham] R. Needham, ``Names'', in
- .I
- Distributed systems,
- .R
- S. Mullender, ed.,
- Addison Wesley, 1989
- .br
- [Pike90] R. Pike, D. Presotto, K. Thompson, H. Trickey,
- ``Plan 9 from Bell Labs'',
- UKUUG Proc. of the Summer 1990 Conf.,
- London, England,
- 1990
- .br
- [Presotto] D. Presotto, ``Multiprocessor Streams for Plan 9'',
- UKUUG Proc. of the Summer 1990 Conf.,
- London, England,
- 1990
- .br
- [Pike91] Pike, R., ``8.5, The Plan 9 Window System'', USENIX Summer
- Conf. Proc., Nashville, 1991
|