123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320132113221323132413251326132713281329133013311332133313341335133613371338133913401341134213431344134513461347134813491350135113521353135413551356135713581359136013611362136313641365136613671368136913701371137213731374137513761377137813791380138113821383138413851386138713881389139013911392139313941395139613971398139914001401140214031404140514061407140814091410141114121413141414151416141714181419142014211422142314241425142614271428142914301431143214331434143514361437143814391440144114421443144414451446144714481449 |
- .HTML "How to Use the Plan 9 C Compiler
- .TL
- How to Use the Plan 9 C Compiler
- .AU
- Rob Pike
- rob@plan9.bell-labs.com
- .SH
- Introduction
- .PP
- The C compiler on Plan 9 is a wholly new program; in fact
- it was the first piece of software written for what would
- eventually become Plan 9 from Bell Labs.
- Programmers familiar with existing C compilers will find
- a number of differences in both the language the Plan 9 compiler
- accepts and in how the compiler is used.
- .PP
- The compiler is really a set of compilers, one for each
- architecture \(em MIPS, SPARC, Motorola 68020, Intel 386, etc. \(em
- that accept a dialect of ANSI C and efficiently produce
- fairly good code for the target machine.
- There is a packaging of the compiler that accepts strict ANSI C for
- a POSIX environment, but this document focuses on the
- native Plan 9 environment, that in which all the system source and
- almost all the utilities are written.
- .SH
- Source
- .PP
- The language accepted by the compilers is the core ANSI C language
- with some modest extensions,
- a greatly simplified preprocessor,
- a smaller library that includes system calls and related facilities,
- and a completely different structure for include files.
- .PP
- Official ANSI C accepts the old (K&R) style of declarations for
- functions; the Plan 9 compilers
- are more demanding.
- Without an explicit run-time flag
- .CW -B ) (
- whose use is discouraged, the compilers insist
- on new-style function declarations, that is, prototypes for
- function arguments.
- The function declarations in the libraries' include files are
- all in the new style so the interfaces are checked at compile time.
- For C programmers who have not yet switched to function prototypes
- the clumsy syntax may seem repellent but the payoff in stronger typing
- is substantial.
- Those who wish to import existing software to Plan 9 are urged
- to use the opportunity to update their code.
- .PP
- The compilers include an integrated preprocessor that accepts the familiar
- .CW #include ,
- .CW #define
- for macros both with and without arguments,
- .CW #undef ,
- .CW #line ,
- .CW #ifdef ,
- .CW #ifndef ,
- and
- .CW #endif .
- It
- supports neither
- .CW #if
- nor
- .CW ## ,
- although it does
- honor a few
- .CW #pragmas .
- The
- .CW #if
- directive was omitted because it greatly complicates the
- preprocessor, is never necessary, and is usually abused.
- Conditional compilation in general makes code hard to understand;
- the Plan 9 source uses it sparingly.
- Also, because the compilers remove dead code, regular
- .CW if
- statements with constant conditions are more readable equivalents to many
- .CW #ifs .
- To compile imported code ineluctably fouled by
- .CW #if
- there is a separate command,
- .CW /bin/cpp ,
- that implements the complete ANSI C preprocessor specification.
- .PP
- Include files fall into two groups: machine-dependent and machine-independent.
- The machine-independent files occupy the directory
- .CW /sys/include ;
- the others are placed in a directory appropriate to the machine, such as
- .CW /mips/include .
- The compiler searches for include files
- first in the machine-dependent directory and then
- in the machine-independent directory.
- At the time of writing there are thirty-one machine-independent include
- files and two (per machine) machine-dependent ones:
- .CW <ureg.h>
- and
- .CW <u.h> .
- The first describes the layout of registers on the system stack,
- for use by the debugger.
- The second defines some
- architecture-dependent types such as
- .CW jmp_buf
- for
- .CW setjmp
- and the
- .CW va_arg
- and
- .CW va_list
- macros for handling arguments to variadic functions,
- as well as a set of
- .CW typedef
- abbreviations for
- .CW unsigned
- .CW short
- and so on.
- .PP
- Here is an excerpt from
- .CW /68020/include/u.h :
- .P1
- #define nil ((void*)0)
- typedef unsigned short ushort;
- typedef unsigned char uchar;
- typedef unsigned long ulong;
- typedef unsigned int uint;
- typedef signed char schar;
- typedef long long vlong;
- typedef long jmp_buf[2];
- #define JMPBUFSP 0
- #define JMPBUFPC 1
- #define JMPBUFDPC 0
- .P2
- Plan 9 programs use
- .CW nil
- for the name of the zero-valued pointer.
- The type
- .CW vlong
- is the largest integer type available; on most architectures it
- is a 64-bit value.
- A couple of other types in
- .CW <u.h>
- are
- .CW u32int ,
- which is guaranteed to have exactly 32 bits (a possibility on all the supported architectures) and
- .CW mpdigit ,
- which is used by the multiprecision math package
- .CW <mp.h> .
- The
- .CW #define
- constants permit an architecture-independent (but compiler-dependent)
- implementation of stack-switching using
- .CW setjmp
- and
- .CW longjmp .
- .PP
- Every Plan 9 C program begins
- .P1
- #include <u.h>
- .P2
- because all the other installed header files use the
- .CW typedefs
- declared in
- .CW <u.h> .
- .PP
- In strict ANSI C, include files are grouped to collect related functions
- in a single file: one for string functions, one for memory functions,
- one for I/O, and none for system calls.
- Each include file is protected by an
- .CW #ifdef
- to guarantee its contents are seen by the compiler only once.
- Plan 9 takes a different approach. Other than a few include
- files that define external formats such as archives, the files in
- .CW /sys/include
- correspond to
- .I libraries.
- If a program is using a library, it includes the corresponding header.
- The default C library comprises string functions, memory functions, and
- so on, largely as in ANSI C, some formatted I/O routines,
- plus all the system calls and related functions.
- To use these functions, one must
- .CW #include
- the file
- .CW <libc.h> ,
- which in turn must follow
- .CW <u.h> ,
- to define their prototypes for the compiler.
- Here is the complete source to the traditional first C program:
- .P1
- #include <u.h>
- #include <libc.h>
- void
- main(void)
- {
- print("hello world\en");
- exits(0);
- }
- .P2
- The
- .CW print
- routine and its relatives
- .CW fprint
- and
- .CW sprint
- resemble the similarly-named functions in Standard I/O but are not
- attached to a specific I/O library.
- In Plan 9
- .CW main
- is not integer-valued; it should call
- .CW exits ,
- which takes a string argument (or null; here ANSI C promotes the 0 to a
- .CW char* ).
- All these functions are, of course, documented in the Programmer's Manual.
- .PP
- To use
- .CW printf ,
- .CW <stdio.h>
- must be included to define the function prototype for
- .CW printf :
- .P1
- #include <u.h>
- #include <libc.h>
- #include <stdio.h>
- void
- main(int argc, char *argv[])
- {
- printf("%s: hello world; argc = %d\en", argv[0], argc);
- exits(0);
- }
- .P2
- In practice, Standard I/O is not used much in Plan 9. I/O libraries are
- discussed in a later section of this document.
- .PP
- There are libraries for handling regular expressions, raster graphics,
- windows, and so on, and each has an associated include file.
- The manual for each library states which include files are needed.
- The files are not protected against multiple inclusion and themselves
- contain no nested
- .CW #includes .
- Instead the
- programmer is expected to sort out the requirements
- and to
- .CW #include
- the necessary files once at the top of each source file. In practice this is
- trivial: this way of handling include files is so straightforward
- that it is rare for a source file to contain more than half a dozen
- .CW #includes .
- .PP
- The compilers do their own register allocation so the
- .CW register
- keyword is ignored.
- For different reasons,
- .CW volatile
- and
- .CW const
- are also ignored.
- .PP
- To make it easier to share code with other systems, Plan 9 has a version
- of the compiler,
- .CW pcc ,
- that provides the standard ANSI C preprocessor, headers, and libraries
- with POSIX extensions.
- .CW Pcc
- is recommended only
- when broad external portability is mandated. It compiles slower,
- produces slower code (it takes extra work to simulate POSIX on Plan 9),
- eliminates those parts of the Plan 9 interface
- not related to POSIX, and illustrates the clumsiness of an environment
- designed by committee.
- .CW Pcc
- is described in more detail in
- .I
- APE\(emThe ANSI/POSIX Environment,
- .R
- by Howard Trickey.
- .SH
- Process
- .PP
- Each CPU architecture supported by Plan 9 is identified by a single,
- arbitrary, alphanumeric character:
- .CW k
- for SPARC,
- .CW q
- for Motorola Power PC 630 and 640,
- .CW v
- for MIPS,
- .CW 0
- for little-endian MIPS,
- .CW 1
- for Motorola 68000,
- .CW 2
- for Motorola 68020 and 68040,
- .CW 5
- for Acorn ARM 7500,
- .CW 6
- for AMD 64,
- .CW 7
- for DEC Alpha,
- .CW 8
- for Intel 386, and
- .CW 9
- for AMD 29000.
- The character labels the support tools and files for that architecture.
- For instance, for the 68020 the compiler is
- .CW 2c ,
- the assembler is
- .CW 2a ,
- the link editor/loader is
- .CW 2l ,
- the object files are suffixed
- .CW \&.2 ,
- and the default name for an executable file is
- .CW 2.out .
- Before we can use the compiler we therefore need to know which
- machine we are compiling for.
- The next section explains how this decision is made; for the moment
- assume we are building 68020 binaries and make the mental substitution for
- .CW 2
- appropriate to the machine you are actually using.
- .PP
- To convert source to an executable binary is a two-step process.
- First run the compiler,
- .CW 2c ,
- on the source, say
- .CW file.c ,
- to generate an object file
- .CW file.2 .
- Then run the loader,
- .CW 2l ,
- to generate an executable
- .CW 2.out
- that may be run (on a 680X0 machine):
- .P1
- 2c file.c
- 2l file.2
- 2.out
- .P2
- The loader automatically links with whatever libraries the program
- needs, usually including the standard C library as defined by
- .CW <libc.h> .
- Of course the compiler and loader have lots of options, both familiar and new;
- see the manual for details.
- The compiler does not generate an executable automatically;
- the output of the compiler must be given to the loader.
- Since most compilation is done under the control of
- .CW mk
- (see below), this is rarely an inconvenience.
- .PP
- The distribution of work between the compiler and loader is unusual.
- The compiler integrates preprocessing, parsing, register allocation,
- code generation and some assembly.
- Combining these tasks in a single program is part of the reason for
- the compiler's efficiency.
- The loader does instruction selection, branch folding,
- instruction scheduling,
- and writes the final executable.
- There is no separate C preprocessor and no assembler in the usual pipeline.
- Instead the intermediate object file
- (here a
- .CW \&.2
- file) is a type of binary assembly language.
- The instructions in the intermediate format are not exactly those in
- the machine. For example, on the 68020 the object file may specify
- a MOVE instruction but the loader will decide just which variant of
- the MOVE instruction \(em MOVE immediate, MOVE quick, MOVE address,
- etc. \(em is most efficient.
- .PP
- The assembler,
- .CW 2a ,
- is just a translator between the textual and binary
- representations of the object file format.
- It is not an assembler in the traditional sense. It has limited
- macro capabilities (the same as the integral C preprocessor in the compiler),
- clumsy syntax, and minimal error checking. For instance, the assembler
- will accept an instruction (such as memory-to-memory MOVE on the MIPS) that the
- machine does not actually support; only when the output of the assembler
- is passed to the loader will the error be discovered.
- The assembler is intended only for writing things that need access to instructions
- invisible from C,
- such as the machine-dependent
- part of an operating system;
- very little code in Plan 9 is in assembly language.
- .PP
- The compilers take an option
- .CW -S
- that causes them to print on their standard output the generated code
- in a format acceptable as input to the assemblers.
- This is of course merely a formatting of the
- data in the object file; therefore the assembler is just
- an
- ASCII-to-binary converter for this format.
- Other than the specific instructions, the input to the assemblers
- is largely architecture-independent; see
- ``A Manual for the Plan 9 Assembler'',
- by Rob Pike,
- for more information.
- .PP
- The loader is an integral part of the compilation process.
- Each library header file contains a
- .CW #pragma
- that tells the loader the name of the associated archive; it is
- not necessary to tell the loader which libraries a program uses.
- The C run-time startup is found, by default, in the C library.
- The loader starts with an undefined
- symbol,
- .CW _main ,
- that is resolved by pulling in the run-time startup code from the library.
- (The loader undefines
- .CW _mainp
- when profiling is enabled, to force loading of the profiling start-up
- instead.)
- .PP
- Unlike its counterpart on other systems, the Plan 9 loader rearranges
- data to optimize access. This means the order of variables in the
- loaded program is unrelated to its order in the source.
- Most programs don't care, but some assume that, for example, the
- variables declared by
- .P1
- int a;
- int b;
- .P2
- will appear at adjacent addresses in memory. On Plan 9, they won't.
- .SH
- Heterogeneity
- .PP
- When the system starts or a user logs in the environment is configured
- so the appropriate binaries are available in
- .CW /bin .
- The configuration process is controlled by an environment variable,
- .CW $cputype ,
- with value such as
- .CW mips ,
- .CW 68020 ,
- .CW 386 ,
- or
- .CW sparc .
- For each architecture there is a directory in the root,
- with the appropriate name,
- that holds the binary and library files for that architecture.
- Thus
- .CW /mips/lib
- contains the object code libraries for MIPS programs,
- .CW /mips/include
- holds MIPS-specific include files, and
- .CW /mips/bin
- has the MIPS binaries.
- These binaries are attached to
- .CW /bin
- at boot time by binding
- .CW /$cputype/bin
- to
- .CW /bin ,
- so
- .CW /bin
- always contains the correct files.
- .PP
- The MIPS compiler,
- .CW vc ,
- by definition
- produces object files for the MIPS architecture,
- regardless of the architecture of the machine on which the compiler is running.
- There is a version of
- .CW vc
- compiled for each architecture:
- .CW /mips/bin/vc ,
- .CW /68020/bin/vc ,
- .CW /sparc/bin/vc ,
- and so on,
- each capable of producing MIPS object files regardless of the native
- instruction set.
- If one is running on a SPARC,
- .CW /sparc/bin/vc
- will compile programs for the MIPS;
- if one is running on machine
- .CW $cputype ,
- .CW /$cputype/bin/vc
- will compile programs for the MIPS.
- .PP
- Because of the bindings that assemble
- .CW /bin ,
- the shell always looks for a command, say
- .CW date ,
- in
- .CW /bin
- and automatically finds the file
- .CW /$cputype/bin/date .
- Therefore the MIPS compiler is known as just
- .CW vc ;
- the shell will invoke
- .CW /bin/vc
- and that is guaranteed to be the version of the MIPS compiler
- appropriate for the machine running the command.
- Regardless of the architecture of the compiling machine,
- .CW /bin/vc
- is
- .I always
- the MIPS compiler.
- .PP
- Also, the output of
- .CW vc
- and
- .CW vl
- is completely independent of the machine type on which they are executed:
- .CW \&.v
- files compiled (with
- .CW vc )
- on a SPARC may be linked (with
- .CW vl )
- on a 386.
- (The resulting
- .CW v.out
- will run, of course, only on a MIPS.)
- Similarly, the MIPS libraries in
- .CW /mips/lib
- are suitable for loading with
- .CW vl
- on any machine; there is only one set of MIPS libraries, not one
- set for each architecture that supports the MIPS compiler.
- .SH
- Heterogeneity and \f(CWmk\fP
- .PP
- Most software on Plan 9 is compiled under the control of
- .CW mk ,
- a descendant of
- .CW make
- that is documented in the Programmer's Manual.
- A convention used throughout the
- .CW mkfiles
- makes it easy to compile the source into binary suitable for any architecture.
- .PP
- The variable
- .CW $cputype
- is advisory: it reports the architecture of the current environment, and should
- not be modified. A second variable,
- .CW $objtype ,
- is used to set which architecture is being
- .I compiled
- for.
- The value of
- .CW $objtype
- can be used by a
- .CW mkfile
- to configure the compilation environment.
- .PP
- In each machine's root directory there is a short
- .CW mkfile
- that defines a set of macros for the compiler, loader, etc.
- Here is
- .CW /mips/mkfile :
- .P1
- </sys/src/mkfile.proto
- CC=vc
- LD=vl
- O=v
- AS=va
- .P2
- The line
- .P1
- </sys/src/mkfile.proto
- .P2
- causes
- .CW mk
- to include the file
- .CW /sys/src/mkfile.proto ,
- which contains general definitions:
- .P1
- #
- # common mkfile parameters shared by all architectures
- #
- OS=v486xq7
- CPUS=mips 386 power alpha
- CFLAGS=-FVw
- LEX=lex
- YACC=yacc
- MK=/bin/mk
- .P2
- .CW CC
- is obviously the compiler,
- .CW AS
- the assembler, and
- .CW LD
- the loader.
- .CW O
- is the suffix for the object files and
- .CW CPUS
- and
- .CW OS
- are used in special rules described below.
- .PP
- Here is a
- .CW mkfile
- to build the installed source for
- .CW sam :
- .P1
- </$objtype/mkfile
- OBJ=sam.$O address.$O buffer.$O cmd.$O disc.$O error.$O \e
- file.$O io.$O list.$O mesg.$O moveto.$O multi.$O \e
- plan9.$O rasp.$O regexp.$O string.$O sys.$O xec.$O
- $O.out: $OBJ
- $LD $OBJ
- install: $O.out
- cp $O.out /$objtype/bin/sam
- installall:
- for(objtype in $CPUS) mk install
- %.$O: %.c
- $CC $CFLAGS $stem.c
- $OBJ: sam.h errors.h mesg.h
- address.$O cmd.$O parse.$O xec.$O unix.$O: parse.h
- clean:V:
- rm -f [$OS].out *.[$OS] y.tab.?
- .P2
- (The actual
- .CW mkfile
- imports most of its rules from other secondary files, but
- this example works and is not misleading.)
- The first line causes
- .CW mk
- to include the contents of
- .CW /$objtype/mkfile
- in the current
- .CW mkfile .
- If
- .CW $objtype
- is
- .CW mips ,
- this inserts the MIPS macro definitions into the
- .CW mkfile .
- In this case the rule for
- .CW $O.out
- uses the MIPS tools to build
- .CW v.out .
- The
- .CW %.$O
- rule in the file uses
- .CW mk 's
- pattern matching facilities to convert the source files to the object
- files through the compiler.
- (The text of the rules is passed directly to the shell,
- .CW rc ,
- without further translation.
- See the
- .CW mk
- manual if any of this is unfamiliar.)
- Because the default rule builds
- .CW $O.out
- rather than
- .CW sam ,
- it is possible to maintain binaries for multiple machines in the
- same source directory without conflict.
- This is also, of course, why the output files from the various
- compilers and loaders
- have distinct names.
- .PP
- The rest of the
- .CW mkfile
- should be easy to follow; notice how the rules for
- .CW clean
- and
- .CW installall
- (that is, install versions for all architectures) use other macros
- defined in
- .CW /$objtype/mkfile .
- In Plan 9,
- .CW mkfiles
- for commands conventionally contain rules to
- .CW install
- (compile and install the version for
- .CW $objtype ),
- .CW installall
- (compile and install for all
- .CW $objtypes ),
- and
- .CW clean
- (remove all object files, binaries, etc.).
- .PP
- The
- .CW mkfile
- is easy to use. To build a MIPS binary,
- .CW v.out :
- .P1
- % objtype=mips
- % mk
- .P2
- To build and install a MIPS binary:
- .P1
- % objtype=mips
- % mk install
- .P2
- To build and install all versions:
- .P1
- % mk installall
- .P2
- These conventions make cross-compilation as easy to manage
- as traditional native compilation.
- Plan 9 programs compile and run without change on machines from
- large multiprocessors to laptops. For more information about this process, see
- ``Plan 9 Mkfiles'',
- by Bob Flandrena.
- .SH
- Portability
- .PP
- Within Plan 9, it is painless to write portable programs, programs whose
- source is independent of the machine on which they execute.
- The operating system is fixed and the compiler, headers and libraries
- are constant so most of the stumbling blocks to portability are removed.
- Attention to a few details can avoid those that remain.
- .PP
- Plan 9 is a heterogeneous environment, so programs must
- .I expect
- that external files will be written by programs on machines of different
- architectures.
- The compilers, for instance, must handle without confusion
- object files written by other machines.
- The traditional approach to this problem is to pepper the source with
- .CW #ifdefs
- to turn byte-swapping on and off.
- Plan 9 takes a different approach: of the handful of machine-dependent
- .CW #ifdefs
- in all the source, almost all are deep in the libraries.
- Instead programs read and write files in a defined format,
- either (for low volume applications) as formatted text, or
- (for high volume applications) as binary in a known byte order.
- If the external data were written with the most significant
- byte first, the following code reads a 4-byte integer correctly
- regardless of the architecture of the executing machine (assuming
- an unsigned long holds 4 bytes):
- .P1
- ulong
- getlong(void)
- {
- ulong l;
- l = (getchar()&0xFF)<<24;
- l |= (getchar()&0xFF)<<16;
- l |= (getchar()&0xFF)<<8;
- l |= (getchar()&0xFF)<<0;
- return l;
- }
- .P2
- Note that this code does not `swap' the bytes; instead it just reads
- them in the correct order.
- Variations of this code will handle any binary format
- and also avoid problems
- involving how structures are padded, how words are aligned,
- and other impediments to portability.
- Be aware, though, that extra care is needed to handle floating point data.
- .PP
- Efficiency hounds will argue that this method is unnecessarily slow and clumsy
- when the executing machine has the same byte order (and padding and alignment)
- as the data.
- The CPU cost of I/O processing
- is rarely the bottleneck for an application, however,
- and the gain in simplicity of porting and maintaining the code greatly outweighs
- the minor speed loss from handling data in this general way.
- This method is how the Plan 9 compilers, the window system, and even the file
- servers transmit data between programs.
- .PP
- To port programs beyond Plan 9, where the system interface is more variable,
- it is probably necessary to use
- .CW pcc
- and hope that the target machine supports ANSI C and POSIX.
- .SH
- I/O
- .PP
- The default C library, defined by the include file
- .CW <libc.h> ,
- contains no buffered I/O package.
- It does have several entry points for printing formatted text:
- .CW print
- outputs text to the standard output,
- .CW fprint
- outputs text to a specified integer file descriptor, and
- .CW sprint
- places text in a character array.
- To access library routines for buffered I/O, a program must
- explicitly include the header file associated with an appropriate library.
- .PP
- The recommended I/O library, used by most Plan 9 utilities, is
- .CW bio
- (buffered I/O), defined by
- .CW <bio.h> .
- There also exists an implementation of ANSI Standard I/O,
- .CW stdio .
- .PP
- .CW Bio
- is small and efficient, particularly for buffer-at-a-time or
- line-at-a-time I/O.
- Even for character-at-a-time I/O, however, it is significantly faster than
- the Standard I/O library,
- .CW stdio .
- Its interface is compact and regular, although it lacks a few conveniences.
- The most noticeable is that one must explicitly define buffers for standard
- input and output;
- .CW bio
- does not predefine them. Here is a program to copy input to output a byte
- at a time using
- .CW bio :
- .P1
- #include <u.h>
- #include <libc.h>
- #include <bio.h>
- Biobuf bin;
- Biobuf bout;
- main(void)
- {
- int c;
- Binit(&bin, 0, OREAD);
- Binit(&bout, 1, OWRITE);
- while((c=Bgetc(&bin)) != Beof)
- Bputc(&bout, c);
- exits(0);
- }
- .P2
- For peak performance, we could replace
- .CW Bgetc
- and
- .CW Bputc
- by their equivalent in-line macros
- .CW BGETC
- and
- .CW BPUTC
- but
- the performance gain would be modest.
- For more information on
- .CW bio ,
- see the Programmer's Manual.
- .PP
- Perhaps the most dramatic difference in the I/O interface of Plan 9 from other
- systems' is that text is not ASCII.
- The format for
- text in Plan 9 is a byte-stream encoding of 16-bit characters.
- The character set is based on the Unicode Standard and is backward compatible with
- ASCII:
- characters with value 0 through 127 are the same in both sets.
- The 16-bit characters, called
- .I runes
- in Plan 9, are encoded using a representation called
- UTF,
- an encoding that is becoming accepted as a standard.
- (ISO calls it UTF-8;
- throughout Plan 9 it's just called
- UTF.)
- UTF
- defines multibyte sequences to
- represent character values from 0 to 65535.
- In
- UTF,
- character values up to 127 decimal, 7F hexadecimal, represent themselves,
- so straight
- ASCII
- files are also valid
- UTF.
- Also,
- UTF
- guarantees that bytes with values 0 to 127 (NUL to DEL, inclusive)
- will appear only when they represent themselves, so programs that read bytes
- looking for plain ASCII characters will continue to work.
- Any program that expects a one-to-one correspondence between bytes and
- characters will, however, need to be modified.
- An example is parsing file names.
- File names, like all text, are in
- UTF,
- so it is incorrect to search for a character in a string by
- .CW strchr(filename,
- .CW c)
- because the character might have a multi-byte encoding.
- The correct method is to call
- .CW utfrune(filename,
- .CW c) ,
- defined in
- .I rune (2),
- which interprets the file name as a sequence of encoded characters
- rather than bytes.
- In fact, even when you know the character is a single byte
- that can represent only itself,
- it is safer to use
- .CW utfrune
- because that assumes nothing about the character set
- and its representation.
- .PP
- The library defines several symbols relevant to the representation of characters.
- Any byte with unsigned value less than
- .CW Runesync
- will not appear in any multi-byte encoding of a character.
- .CW Utfrune
- compares the character being searched against
- .CW Runesync
- to see if it is sufficient to call
- .CW strchr
- or if the byte stream must be interpreted.
- Any byte with unsigned value less than
- .CW Runeself
- is represented by a single byte with the same value.
- Finally, when errors are encountered converting
- to runes from a byte stream, the library returns the rune value
- .CW Runeerror
- and advances a single byte. This permits programs to find runes
- embedded in binary data.
- .PP
- .CW Bio
- includes routines
- .CW Bgetrune
- and
- .CW Bputrune
- to transform the external byte stream
- UTF
- format to and from
- internal 16-bit runes.
- Also, the
- .CW %s
- format to
- .CW print
- accepts
- UTF;
- .CW %c
- prints a character after narrowing it to 8 bits.
- The
- .CW %S
- format prints a null-terminated sequence of runes;
- .CW %C
- prints a character after narrowing it to 16 bits.
- For more information, see the Programmer's Manual, in particular
- .I utf (6)
- and
- .I rune (2),
- and the paper,
- ``Hello world, or
- Καλημέρα κόσμε, or\
- \f(Jpこんにちは 世界\f1'',
- by Rob Pike and
- Ken Thompson;
- there is not room for the full story here.
- .PP
- These issues affect the compiler in several ways.
- First, the C source is in
- UTF.
- ANSI says C variables are formed from
- ASCII
- alphanumerics, but comments and literal strings may contain any characters
- encoded in the native encoding, here
- UTF.
- The declaration
- .P1
- char *cp = "abcÿ";
- .P2
- initializes the variable
- .CW cp
- to point to an array of bytes holding the
- UTF
- representation of the characters
- .CW abcÿ.
- The type
- .CW Rune
- is defined in
- .CW <u.h>
- to be
- .CW ushort ,
- which is also the `wide character' type in the compiler.
- Therefore the declaration
- .P1
- Rune *rp = L"abcÿ";
- .P2
- initializes the variable
- .CW rp
- to point to an array of unsigned short integers holding the 16-bit
- values of the characters
- .CW abcÿ .
- Note that in both these declarations the characters in the source
- that represent
- .CW "abcÿ"
- are the same; what changes is how those characters are represented
- in memory in the program.
- The following two lines:
- .P1
- print("%s\en", "abcÿ");
- print("%S\en", L"abcÿ");
- .P2
- produce the same
- UTF
- string on their output, the first by copying the bytes, the second
- by converting from runes to bytes.
- .PP
- In C, character constants are integers but narrowed through the
- .CW char
- type.
- The Unicode character
- .CW ÿ
- has value 255, so if the
- .CW char
- type is signed,
- the constant
- .CW 'ÿ'
- has value \-1 (which is equal to EOF).
- On the other hand,
- .CW L'ÿ'
- narrows through the wide character type,
- .CW ushort ,
- and therefore has value 255.
- .PP
- Finally, although it's not ANSI C, the Plan 9 C compilers
- assume any character with value above
- .CW Runeself
- is an alphanumeric,
- so α is a legal, if non-portable, variable name.
- .SH
- Arguments
- .PP
- Some macros are defined
- in
- .CW <libc.h>
- for parsing the arguments to
- .CW main() .
- They are described in
- .I ARG (2)
- but are fairly self-explanatory.
- There are four macros:
- .CW ARGBEGIN
- and
- .CW ARGEND
- are used to bracket a hidden
- .CW switch
- statement within which
- .CW ARGC
- returns the current option character (rune) being processed and
- .CW ARGF
- returns the argument to the option, as in the loader option
- .CW -o
- .CW file .
- Here, for example, is the code at the beginning of
- .CW main()
- in
- .CW ramfs.c
- (see
- .I ramfs (1))
- that cracks its arguments:
- .P1
- void
- main(int argc, char *argv[])
- {
- char *defmnt;
- int p[2];
- int mfd[2];
- int stdio = 0;
- defmnt = "/tmp";
- ARGBEGIN{
- case 'i':
- defmnt = 0;
- stdio = 1;
- mfd[0] = 0;
- mfd[1] = 1;
- break;
- case 's':
- defmnt = 0;
- break;
- case 'm':
- defmnt = ARGF();
- break;
- default:
- usage();
- }ARGEND
- .P2
- .SH
- Extensions
- .PP
- The compiler has several extensions to ANSI C, all of which are used
- extensively in the system source.
- First,
- .I structure
- .I displays
- permit
- .CW struct
- expressions to be formed dynamically.
- Given these declarations:
- .P1
- typedef struct Point Point;
- typedef struct Rectangle Rectangle;
- struct Point
- {
- int x, y;
- };
- struct Rectangle
- {
- Point min, max;
- };
- Point p, q, add(Point, Point);
- Rectangle r;
- int x, y;
- .P2
- this assignment may appear anywhere an assignment is legal:
- .P1
- r = (Rectangle){add(p, q), (Point){x, y+3}};
- .P2
- The syntax is the same as for initializing a structure but with
- a leading cast.
- .PP
- If an
- .I anonymous
- .I structure
- or
- .I union
- is declared within another structure or union, the members of the internal
- structure or union are addressable without prefix in the outer structure.
- This feature eliminates the clumsy naming of nested structures and,
- particularly, unions.
- For example, after these declarations,
- .P1
- struct Lock
- {
- int locked;
- };
- struct Node
- {
- int type;
- union{
- double dval;
- double fval;
- long lval;
- }; /* anonymous union */
- struct Lock; /* anonymous structure */
- } *node;
- void lock(struct Lock*);
- .P2
- one may refer to
- .CW node->type ,
- .CW node->dval ,
- .CW node->fval ,
- .CW node->lval ,
- and
- .CW node->locked .
- Moreover, the address of a
- .CW struct
- .CW Node
- may be used without a cast anywhere that the address of a
- .CW struct
- .CW Lock
- is used, such as in argument lists.
- The compiler automatically promotes the type and adjusts the address.
- Thus one may invoke
- .CW lock(node) .
- .PP
- Anonymous structures and unions may be accessed by type name
- if (and only if) they are declared using a
- .CW typedef
- name.
- For example, using the above declaration for
- .CW Point ,
- one may declare
- .P1
- struct
- {
- int type;
- Point;
- } p;
- .P2
- and refer to
- .CW p.Point .
- .PP
- In the initialization of arrays, a number in square brackets before an
- element sets the index for the initialization. For example, to initialize
- some elements in
- a table of function pointers indexed by
- ASCII
- character,
- .P1
- void percent(void), slash(void);
- void (*func[128])(void) =
- {
- ['%'] percent,
- ['/'] slash,
- };
- .P2
- .LP
- A similar syntax allows one to initialize structure elements:
- .P1
- Point p =
- {
- .y 100,
- .x 200
- };
- .P2
- These initialization syntaxes were later added to ANSI C, with the addition of an
- equals sign between the index or tag and the value.
- The Plan 9 compiler accepts either form.
- .PP
- Finally, the declaration
- .P1
- extern register reg;
- .P2
- .I this "" (
- appearance of the register keyword is not ignored)
- allocates a global register to hold the variable
- .CW reg .
- External registers must be used carefully: they need to be declared in
- .I all
- source files and libraries in the program to guarantee the register
- is not allocated temporarily for other purposes.
- Especially on machines with few registers, such as the i386,
- it is easy to link accidentally with code that has already usurped
- the global registers and there is no diagnostic when this happens.
- Used wisely, though, external registers are powerful.
- The Plan 9 operating system uses them to access per-process and
- per-machine data structures on a multiprocessor. The storage class they provide
- is hard to create in other ways.
- .SH
- The compile-time environment
- .PP
- The code generated by the compilers is `optimized' by default:
- variables are placed in registers and peephole optimizations are
- performed.
- The compiler flag
- .CW -N
- disables these optimizations.
- Registerization is done locally rather than throughout a function:
- whether a variable occupies a register or
- the memory location identified in the symbol
- table depends on the activity of the variable and may change
- throughout the life of the variable.
- The
- .CW -N
- flag is rarely needed;
- its main use is to simplify debugging.
- There is no information in the symbol table to identify the
- registerization of a variable, so
- .CW -N
- guarantees the variable is always where the symbol table says it is.
- .PP
- Another flag,
- .CW -w ,
- turns
- .I on
- warnings about portability and problems detected in flow analysis.
- Most code in Plan 9 is compiled with warnings enabled;
- these warnings plus the type checking offered by function prototypes
- provide most of the support of the Unix tool
- .CW lint
- more accurately and with less chatter.
- Two of the warnings,
- `used and not set' and `set and not used', are almost always accurate but
- may be triggered spuriously by code with invisible control flow,
- such as in routines that call
- .CW longjmp .
- The compiler statements
- .P1
- SET(v1);
- USED(v2);
- .P2
- decorate the flow graph to silence the compiler.
- Either statement accepts a comma-separated list of variables.
- Use them carefully: they may silence real errors.
- For the common case of unused parameters to a function,
- leaving the name off the declaration silences the warnings.
- That is, listing the type of a parameter but giving it no
- associated variable name does the trick.
- .SH
- Debugging
- .PP
- There are two debuggers available on Plan 9.
- The first, and older, is
- .CW db ,
- a revision of Unix
- .CW adb .
- The other,
- .CW acid ,
- is a source-level debugger whose commands are statements in
- a true programming language.
- .CW Acid
- is the preferred debugger, but since it
- borrows some elements of
- .CW db ,
- notably the formats for displaying values, it is worth knowing a little bit about
- .CW db .
- .PP
- Both debuggers support multiple architectures in a single program; that is,
- the programs are
- .CW db
- and
- .CW acid ,
- not for example
- .CW vdb
- and
- .CW vacid .
- They also support cross-architecture debugging comfortably:
- one may debug a 68020 binary on a MIPS.
- .PP
- Imagine a program has crashed mysteriously:
- .P1
- % X11/X
- Fatal server bug!
- failed to create default stipple
- X 106: suicide: sys: trap: fault read addr=0x0 pc=0x00105fb8
- %
- .P2
- When a process dies on Plan 9 it hangs in the `broken' state
- for debugging.
- Attach a debugger to the process by naming its process id:
- .P1
- % acid 106
- /proc/106/text:mips plan 9 executable
- /sys/lib/acid/port
- /sys/lib/acid/mips
- acid:
- .P2
- The
- .CW acid
- function
- .CW stk()
- reports the stack traceback:
- .P1
- acid: stk()
- At pc:0x105fb8:abort+0x24 /sys/src/ape/lib/ap/stdio/abort.c:6
- abort() /sys/src/ape/lib/ap/stdio/abort.c:4
- called from FatalError+#4e
- /sys/src/X/mit/server/dix/misc.c:421
- FatalError(s9=#e02, s8=#4901d200, s7=#2, s6=#72701, s5=#1,
- s4=#7270d, s3=#6, s2=#12, s1=#ff37f1c, s0=#6, f=#7270f)
- /sys/src/X/mit/server/dix/misc.c:416
- called from gnotscreeninit+#4ce
- /sys/src/X/mit/server/ddx/gnot/gnot.c:792
- gnotscreeninit(snum=#0, sc=#80db0)
- /sys/src/X/mit/server/ddx/gnot/gnot.c:766
- called from AddScreen+#16e
- /n/bootes/sys/src/X/mit/server/dix/main.c:610
- AddScreen(pfnInit=0x0000129c,argc=0x00000001,argv=0x7fffffe4)
- /sys/src/X/mit/server/dix/main.c:530
- called from InitOutput+0x80
- /sys/src/X/mit/server/ddx/brazil/brddx.c:522
- InitOutput(argc=0x00000001,argv=0x7fffffe4)
- /sys/src/X/mit/server/ddx/brazil/brddx.c:511
- called from main+0x294
- /sys/src/X/mit/server/dix/main.c:225
- main(argc=0x00000001,argv=0x7fffffe4)
- /sys/src/X/mit/server/dix/main.c:136
- called from _main+0x24
- /sys/src/ape/lib/ap/mips/main9.s:8
- .P2
- The function
- .CW lstk()
- is similar but
- also reports the values of local variables.
- Note that the traceback includes full file names; this is a boon to debugging,
- although it makes the output much noisier.
- .PP
- To use
- .CW acid
- well you will need to learn its input language; see the
- ``Acid Manual'',
- by Phil Winterbottom,
- for details. For simple debugging, however, the information in the manual page is
- sufficient. In particular, it describes the most useful functions
- for examining a process.
- .PP
- The compiler does not place
- information describing the types of variables in the executable,
- but a compile-time flag provides crude support for symbolic debugging.
- The
- .CW -a
- flag to the compiler suppresses code generation
- and instead emits source text in the
- .CW acid
- language to format and display data structure types defined in the program.
- The easiest way to use this feature is to put a rule in the
- .CW mkfile :
- .P1
- syms: main.$O
- $CC -a main.c > syms
- .P2
- Then from within
- .CW acid ,
- .P1
- acid: include("sourcedirectory/syms")
- .P2
- to read in the relevant definitions.
- (For multi-file source, you need to be a little fancier;
- see
- .I 2c (1)).
- This text includes, for each defined compound
- type, a function with that name that may be called with the address of a structure
- of that type to display its contents.
- For example, if
- .CW rect
- is a global variable of type
- .CW Rectangle ,
- one may execute
- .P1
- Rectangle(*rect)
- .P2
- to display it.
- The
- .CW *
- (indirection) operator is necessary because
- of the way
- .CW acid
- works: each global symbol in the program is defined as a variable by
- .CW acid ,
- with value equal to the
- .I address
- of the symbol.
- .PP
- Another common technique is to write by hand special
- .CW acid
- code to define functions to aid debugging, initialize the debugger, and so on.
- Conventionally, this is placed in a file called
- .CW acid
- in the source directory; it has a line
- .P1
- include("sourcedirectory/syms");
- .P2
- to load the compiler-produced symbols. One may edit the compiler output directly but
- it is wiser to keep the hand-generated
- .CW acid
- separate from the machine-generated.
- .PP
- To make things simple, the default rules in the system
- .CW mkfiles
- include entries to make
- .CW foo.acid
- from
- .CW foo.c ,
- so one may use
- .CW mk
- to automate the production of
- .CW acid
- definitions for a given C source file.
- .PP
- There is much more to say here. See
- .CW acid
- manual page, the reference manual, or the paper
- ``Acid: A Debugger Built From A Language'',
- also by Phil Winterbottom.
|