1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156 |
- .TL
- Plan 9 C Compilers
- .AU
- Ken Thompson
- ken@plan9.bell-labs.com
- .AB
- .FS
- Originally appeared, in a different form, in
- .I
- Proceedings of the Summer 1990 UKUUG Conference,
- .R
- pp. 41-51,
- London, 1990.
- .FE
- This paper describes the overall structure and function of the Plan 9 C compilers.
- A more detailed implementation document
- for any one of the compilers
- is yet to be written.
- .AE
- .NH
- Introduction
- .LP
- There are many compilers in the series.
- Six of the compilers (MIPS 3000, SPARC, Intel 386, Power PC, DEC Alpha, and Motorola 68020)
- are considered active and are used to compile
- current versions of Plan 9.
- Several others (Motorola 68000, Intel 960, ARM 7500, AMD 29000) have had only limited use, such as
- to program peripherals or experimental devices.
- .NH
- Structure
- .LP
- The compiler is a single program that produces an
- object file.
- Combined in the compiler are the traditional
- roles of preprocessor, lexical analyzer, parser, code generator,
- local optimizer,
- and first half of the assembler.
- The object files are binary forms of assembly
- language,
- similar to what might be passed between
- the first and second passes of an assembler.
- .LP
- Object files and libraries
- are combined by a loader
- program to produce the executable binary.
- The loader combines the roles of second half
- of the assembler, global optimizer, and loader.
- The names of the compliers, loaders, and assemblers
- are as follows:
- .DS
- .ta 1.5i
- .de Ta
- \\$1 \f(CW\\$2\fP \f(CW\\$3\fP \f(CW\\$4\fP
- ..
- .Ta SPARC kc kl ka
- .Ta Power\ PC qc ql qa
- .Ta MIPS vc vl va
- .Ta Motorola\ 68000 1c 1l 1a
- .Ta Motorola\ 68020 2c 2l 2a
- .Ta ARM\ 7500 5c 5l 5a
- .Ta Intel\ 960 6c 6l 6a
- .Ta DEC\ Alpha 7c 7l 7a
- .Ta Intel\ 386 8c 8l 8a
- .Ta AMD\ 29000 9c 9l 9a
- .DE
- There is a further breakdown
- in the source of the compilers into
- object-independent and
- object-dependent
- parts.
- All of the object-independent parts
- are combined into source files in the
- directory
- .CW /sys/src/cmd/cc .
- The object-dependent parts are collected
- in a separate directory for each compiler,
- for example
- .CW /sys/src/cmd/vc .
- All of the code,
- both object-independent and
- object-dependent,
- is machine-independent
- and may be cross-compiled and executed on any
- of the architectures.
- .NH
- The Language
- .LP
- The compiler implements ANSI C with some
- restrictions and extensions
- [ANSI90].
- Most of the restrictions are due to
- personal preference, while
- most of the extensions were to help in
- the implementation of Plan 9.
- There are other departures from the standard,
- particularly in the libraries,
- that are beyond the scope of this
- paper.
- .NH 2
- Register, volatile, const
- .LP
- The keyword
- .CW register
- is recognized syntactically
- but is semantically ignored.
- Thus taking the address of a
- .CW register
- variable is not diagnosed.
- The keyword
- .CW volatile
- disables all optimizations, in particular registerization, of the corresponding variable.
- The keyword
- .CW const
- generates warnings (if warnings are enabled by the compiler's
- .CW -w
- option) of non-constant use of the variable,
- but does not affect the generated code.
- .NH 2
- The preprocessor
- .LP
- The C preprocessor is probably the
- biggest departure from the ANSI standard.
- .LP
- The preprocessor built into the Plan 9 compilers does not support
- .CW #if ,
- although it does handle
- .CW #ifdef
- and
- .CW #include .
- If it is necessary to be more standard,
- the source text can first be run through the separate ANSI C
- preprocessor,
- .CW cpp .
- .NH 2
- Unnamed substructures
- .LP
- The most important and most heavily used of the
- extensions is the declaration of an
- unnamed substructure or subunion.
- For example:
- .DS
- .CW
- .ta .1i .6i 1.1i 1.6i
- typedef
- struct lock
- {
- int locked;
- } Lock;
- typedef
- struct node
- {
- int type;
- union
- {
- double dval;
- float fval;
- long lval;
- };
- Lock;
- } Node;
- Lock* lock;
- Node* node;
- .DE
- The declaration of
- .CW Node
- has an unnamed substructure of type
- .CW Lock
- and an unnamed subunion.
- One use of this feature allows references to elements of the
- subunit to be accessed as if they were in
- the outer structure.
- Thus
- .CW node->dval
- and
- .CW node->locked
- are legitimate references.
- .LP
- When an outer structure is used
- in a context that is only legal for
- an unnamed substructure,
- the compiler promotes the reference to the
- unnamed substructure.
- This is true for references to structures and
- to references to pointers to structures.
- This happens in assignment statements and
- in argument passing where prototypes have been
- declared.
- Thus, continuing with the example,
- .DS
- .CW
- .ta .1i .6i 1.1i 1.6i
- lock = node;
- .DE
- would assign a pointer to the unnamed
- .CW Lock
- in
- the
- .CW Node
- to the variable
- .CW lock .
- Another example,
- .DS
- .CW
- .ta .1i .6i 1.1i 1.6i
- extern void lock(Lock*);
- func(...)
- {
- ...
- lock(node);
- ...
- }
- .DE
- will pass a pointer to the
- .CW Lock
- substructure.
- .LP
- Finally, in places where context is insufficient to identify the unnamed structure,
- the type name (it must be a
- .CW typedef )
- of the unnamed structure can be used as an identifier.
- In our example,
- .CW &node->Lock
- gives the address of the anonymous
- .CW Lock
- structure.
- .NH 2
- Structure displays
- .LP
- A structure cast followed by a list of expressions in braces is
- an expression with the type of the structure and elements assigned from
- the corresponding list.
- Structures are now almost first-class citizens of the language.
- It is common to see code like this:
- .DS
- .CW
- .ta .1i
- r = (Rectangle){point1, (Point){x,y+2}};
- .DE
- .NH 2
- Initialization indexes
- .LP
- In initializers of arrays,
- one may place a constant expression
- in square brackets before an initializer.
- This causes the next initializer to assign
- the indicated element.
- For example:
- .DS
- .CW
- .ta .1i .6i 1.6i
- enum errors
- {
- Etoobig,
- Ealarm,
- Egreg
- };
- char* errstrings[] =
- {
- [Ealarm] "Alarm call",
- [Egreg] "Panic: out of mbufs",
- [Etoobig] "Arg list too long",
- };
- .DE
- In the same way,
- individual structures members may
- be initialized in any order by preceding the initialization with
- .CW .tagname .
- Both forms allow an optional
- .CW = ,
- to be compatible with a proposed
- extension to ANSI C.
- .NH 2
- External register
- .LP
- The declaration
- .CW extern
- .CW register
- will dedicate a register to
- a variable on a global basis.
- It can be used only under special circumstances.
- External register variables must be identically
- declared in all modules and
- libraries.
- The feature is not intended for efficiency,
- although it can produce efficient code;
- rather it represents a unique storage class that
- would be hard to get any other way.
- On a shared-memory multi-processor,
- an external register is
- one-per-processor and neither one-per-procedure (automatic)
- or one-per-system (external).
- It is used for two variables in the Plan 9 kernel,
- .CW u
- and
- .CW m .
- .CW U
- is a pointer to the structure representing the currently running process
- and
- .CW m
- is a pointer to the per-machine data structure.
- .NH 2
- Long long
- .LP
- The compilers accept
- .CW long
- .CW long
- as a basic type meaning 64-bit integer.
- On all of the machines
- this type is synthesized from 32-bit instructions.
- .NH 2
- Pragma
- .LP
- The compilers accept
- .CW #pragma
- .CW lib
- .I libname
- and pass the
- library name string uninterpreted
- to the loader.
- The loader uses the library name to
- find libraries to load.
- If the name contains
- .CW %O ,
- it is replaced with
- the single character object type of the compiler
- (e.g.,
- .CW v
- for the MIPS).
- If the name contains
- .CW %M ,
- it is replaced with
- the architecture type for the compiler
- (e.g.,
- .CW mips
- for the MIPS).
- If the name starts with
- .CW /
- it is an absolute pathname;
- if it starts with
- .CW .
- then it is searched for in the loader's current directory.
- Otherwise, the name is searched from
- .CW /%M/lib .
- Such
- .CW #pragma
- statements in header files guarantee that the correct
- libraries are always linked with a program without the
- need to specify them explicitly at link time.
- .LP
- They also accept
- .CW #pragma
- .CW hjdicks
- .CW on
- (or
- .CW yes
- or
- .CW 1 )
- to cause subsequently declared data, until
- .CW #pragma
- .CW hjdicks
- .CW off
- (or
- .CW no
- or
- .CW 0 ),
- to be laid out in memory tightly packed in successive bytes, disregarding
- the usual alignment rules.
- Accessing such data can cause faults.
- .LP
- Similarly,
- .CW #pragma
- .CW profile
- .CW off
- (or
- .CW no
- or
- .CW 0 )
- causes subsequently declared functions, until
- .CW #pragma
- .CW profile
- .CW on
- (or
- .CW yes
- or
- .CW 1 ),
- to be marked as unprofiled.
- Such functions will not be profiled when
- profiling is enabled for the rest of the program.
- .LP
- Two
- .CW #pragma
- statements allow type-checking of
- .CW print -like
- functions.
- The first, of the form
- .P1
- #pragma varargck argpos error 2
- .P2
- tells the compiler that the second argument to
- .CW error
- is a
- .CW print
- format string (see the manual page
- .I print (2))
- that specifies how to format
- .CW error 's
- subsequent arguments.
- The second, of the form
- .P1
- #pragma varargck type "s" char*
- .P2
- says that the
- .CW print
- format verb
- .CW s
- processes an argument of
- type
- .CW char* .
- If the compiler's
- .CW -F
- option is enabled, the compiler will use this information
- to report type violations in the arguments to
- .CW print ,
- .CW error ,
- and similar routines.
- .NH
- Object module conventions
- .LP
- The overall conventions of the runtime environment
- are important
- to runtime efficiency.
- In this section,
- several of these conventions are discussed.
- .NH 2
- Register saving
- .LP
- In the Plan 9 compilers,
- the caller of a procedure saves the registers.
- With caller-saves,
- the leaf procedures can use all the
- registers and never save them.
- If you spend a lot of time at the leaves,
- this seems preferable.
- With callee-saves,
- the saving of the registers is done
- in the single point of entry and return.
- If you are interested in space,
- this seems preferable.
- In both,
- there is a degree of uncertainty
- about what registers need to be saved.
- Callee-saved registers make it difficult to
- find variables in registers in debuggers.
- Callee-saved registers also complicate
- the implementation of
- .CW longjmp .
- The convincing argument is
- that with caller-saves,
- the decision to registerize a variable
- can include the cost of saving the register
- across calls.
- For a further discussion of caller- vs. callee-saves,
- see the paper by Davidson and Whalley [Dav91].
- .LP
- In the Plan 9 operating system,
- calls to the kernel look like normal procedure
- calls, which means
- the caller
- has saved the registers and the system
- entry does not have to.
- This makes system calls considerably faster.
- Since this is a potential security hole,
- and can lead to non-determinism,
- the system may eventually save the registers
- on entry,
- or more likely clear the registers on return.
- .NH 2
- Calling convention
- .LP
- Older C compilers maintain a frame pointer, which is at a known constant
- offset from the stack pointer within each function.
- For machines where the stack grows towards zero,
- the argument pointer is at a known constant offset
- from the frame pointer.
- Since the stack grows down in Plan 9,
- the Plan 9 compilers
- keep neither an
- explicit frame pointer nor
- an explicit argument pointer;
- instead they generate addresses relative to the stack pointer.
- .LP
- On some architectures, the first argument to a subroutine is passed in a register.
- .NH 2
- Functions returning structures
- .LP
- Structures longer than one word are awkward to implement
- since they do not fit in registers and must
- be passed around in memory.
- Functions that return structures
- are particularly clumsy.
- The Plan 9 compilers pass the return address of
- a structure as the first argument of a
- function that has a structure return value.
- Thus
- .DS
- .CW
- .ta .1i .6i 1.1i 1.6i
- x = f(...)
- .DE
- is rewritten as
- .DS
- .CW
- .ta .1i .6i 1.1i 1.6i
- f(&x, ...)\f1.
- .DE
- This saves a copy and makes the compilation
- much less clumsy.
- A disadvantage is that if you call this
- function without an assignment,
- a dummy location must be invented.
- .LP
- There is also a danger of calling a function
- that returns a structure without declaring
- it as such.
- With ANSI C function prototypes,
- this error need never occur.
- .NH
- Implementation
- .LP
- The compiler is divided internally into
- four machine-independent passes,
- four machine-dependent passes,
- and an output pass.
- The next nine sections describe each pass in order.
- .NH 2
- Parsing
- .LP
- The first pass is a YACC-based parser
- [Joh79].
- Declarations are interpreted immediately,
- building a block structured symbol table.
- Executable statements are put into a parse tree
- and collected,
- without interpretation.
- At the end of each procedure,
- the parse tree for the function is
- examined by the other passes of the compiler.
- .LP
- The input stream of the parser is
- a pushdown list of input activations.
- The preprocessor
- expansions of
- macros
- and
- .CW #include
- are implemented as pushdowns.
- Thus there is no separate
- pass for preprocessing.
- .NH 2
- Typing
- .LP
- The next pass distributes typing information
- to every node of the tree.
- Implicit operations on the tree are added,
- such as type promotions and taking the
- address of arrays and functions.
- .NH 2
- Machine-independent optimization
- .LP
- The next pass performs optimizations
- and transformations of the tree, such as converting
- .CW &*x
- and
- .CW *&x
- into
- .CW x .
- Constant expressions are converted to constants in this pass.
- .NH 2
- Arithmetic rewrites
- .LP
- This is another machine-independent optimization.
- Subtrees of add, subtract, and multiply of integers are
- rewritten for easier compilation.
- The major transformation is factoring:
- .CW 4+8*a+16*b+5
- is transformed into
- .CW 9+8*(a+2*b) .
- Such expressions arise from address
- manipulation and array indexing.
- .NH 2
- Addressability
- .LP
- This is the first of the machine-dependent passes.
- The addressability of a processor is defined as the set of
- expressions that is legal in the address field
- of a machine language instruction.
- The addressability of different processors varies widely.
- At one end of the spectrum are the 68020 and VAX,
- which allow a complex mix of incrementing,
- decrementing,
- indexing, and relative addressing.
- At the other end is the MIPS,
- which allows only registers and constant offsets from the
- contents of a register.
- The addressability can be different for different instructions
- within the same processor.
- .LP
- It is important to the code generator to know when a
- subtree represents an address of a particular type.
- This is done with a bottom-up walk of the tree.
- In this pass, the leaves are labeled with small integers.
- When an internal node is encountered,
- it is labeled by consulting a table indexed by the
- labels on the left and right subtrees.
- For example,
- on the 68020 processor,
- it is possible to address an
- offset from a named location.
- In C, this is represented by the expression
- .CW *(&name+constant) .
- This is marked addressable by the following table.
- In the table,
- a node represented by the left column is marked
- with a small integer from the right column.
- Marks of the form
- .CW A\s-2\di\u\s0
- are addressable while
- marks of the form
- .CW N\s-2\di\u\s0
- are not addressable.
- .DS
- .B
- .ta .1i 1.1i
- Node Marked
- .CW
- name A\s-2\d1\u\s0
- const A\s-2\d2\u\s0
- &A\s-2\d1\u\s0 A\s-2\d3\u\s0
- A\s-2\d3\u\s0+A\s-2\d1\u\s0 N\s-2\d1\u\s0 \fR(note that this is not addressable)\fP
- *N\s-2\d1\u\s0 A\s-2\d4\u\s0
- .DE
- Here there is a distinction between
- a node marked
- .CW A\s-2\d1\u\s0
- and a node marked
- .CW A\s-2\d4\u\s0
- because the address operator of an
- .CW A\s-2\d4\u\s0
- node is not addressable.
- So to extend the table:
- .DS
- .B
- .ta .1i 1.1i
- Node Marked
- .CW
- &A\s-2\d4\u\s0 N\s-2\d2\u\s0
- N\s-2\d2\u\s0+N\s-2\d1\u\s0 N\s-2\d1\u\s0
- .DE
- The full addressability of the 68020 is expressed
- in 18 rules like this,
- while the addressability of the MIPS is expressed
- in 11 rules.
- When one ports the compiler,
- this table is usually initialized
- so that leaves are labeled as addressable and nothing else.
- The code produced is poor,
- but porting is easy.
- The table can be extended later.
- .LP
- This pass also rewrites some complex operators
- into procedure calls.
- Examples include 64-bit multiply and divide.
- .LP
- In the same bottom-up pass of the tree,
- the nodes are labeled with a Sethi-Ullman complexity
- [Set70].
- This number is roughly the number of registers required
- to compile the tree on an ideal machine.
- An addressable node is marked 0.
- A function call is marked infinite.
- A unary operator is marked as the
- maximum of 1 and the mark of its subtree.
- A binary operator with equal marks on its subtrees is
- marked with a subtree mark plus 1.
- A binary operator with unequal marks on its subtrees is
- marked with the maximum mark of its subtrees.
- The actual values of the marks are not too important,
- but the relative values are.
- The goal is to compile the harder
- (larger mark)
- subtree first.
- .NH 2
- Code generation
- .LP
- Code is generated by recursive
- descent.
- The Sethi-Ullman complexity completely guides the
- order.
- The addressability defines the leaves.
- The only difficult part is compiling a tree
- that has two infinite (function call)
- subtrees.
- In this case,
- one subtree is compiled into the return register
- (usually the most convenient place for a function call)
- and then stored on the stack.
- The other subtree is compiled into the return register
- and then the operation is compiled with
- operands from the stack and the return register.
- .LP
- There is a separate boolean code generator that compiles
- conditional expressions.
- This is fundamentally different from compiling an arithmetic expression.
- The result of the boolean code generator is the
- position of the program counter and not an expression.
- The boolean code generator makes extensive use of De Morgan's rule.
- The boolean code generator is an expanded version of that described
- in chapter 8 of Aho, Sethi, and Ullman
- [Aho87].
- .LP
- There is a considerable amount of talk in the literature
- about automating this part of a compiler with a machine
- description.
- Since this code generator is so small
- (less than 500 lines of C)
- and easy,
- it hardly seems worth the effort.
- .NH 2
- Registerization
- .LP
- Up to now,
- the compiler has operated on syntax trees
- that are roughly equivalent to the original source language.
- The previous pass has produced machine language in an internal
- format.
- The next two passes operate on the internal machine language
- structures.
- The purpose of the next pass is to reintroduce
- registers for heavily used variables.
- .LP
- All of the variables that can be
- potentially registerized within a procedure are
- placed in a table.
- (Suitable variables are any automatic or external
- scalars that do not have their addresses extracted.
- Some constants that are hard to reference are also
- considered for registerization.)
- Four separate data flow equations are evaluated
- over the procedure on all of these variables.
- Two of the equations are the normal set-behind
- and used-ahead
- bits that define the life of a variable.
- The two new bits tell if a variable life
- crosses a function call ahead or behind.
- By examining a variable over its lifetime,
- it is possible to get a cost
- for registerizing.
- Loops are detected and the costs are multiplied
- by three for every level of loop nesting.
- Costs are sorted and the variables
- are replaced by available registers on a greedy basis.
- .LP
- The 68020 has two different
- types of registers.
- For the 68020,
- two different costs are calculated for
- each variable life and the register type that
- affords the better cost is used.
- Ties are broken by counting the number of available
- registers of each type.
- .LP
- Note that externals are registerized together with automatics.
- This is done by evaluating the semantics of a ``call'' instruction
- differently for externals and automatics.
- Since a call goes outside the local procedure,
- it is assumed that a call references all externals.
- Similarly,
- externals are assumed to be set before an ``entry'' instruction
- and assumed to be referenced after a ``return'' instruction.
- This makes sure that externals are in memory across calls.
- .LP
- The overall results are satisfactory.
- It would be nice to be able to do this processing in
- a machine-independent way,
- but it is impossible to get all of the costs and
- side effects of different choices by examining the parse tree.
- .LP
- Most of the code in the registerization pass is machine-independent.
- The major machine-dependency is in
- examining a machine instruction to ask if it sets or references
- a variable.
- .NH 2
- Machine code optimization
- .LP
- The next pass walks the machine code
- for opportunistic optimizations.
- For the most part,
- this is highly specific to a particular
- processor.
- One optimization that is performed
- on all of the processors is the
- removal of unnecessary ``move''
- instructions.
- Ironically,
- most of these instructions were inserted by
- the previous pass.
- There are two patterns that are repetitively
- matched and replaced until no more matches are
- found.
- The first tries to remove ``move'' instructions
- by relabeling variables.
- .LP
- When a ``move'' instruction is encountered,
- if the destination variable is set before the
- source variable is referenced,
- then all of the references to the destination
- variable can be renamed to the source and the ``move''
- can be deleted.
- This transformation uses the reverse data flow
- set up in the previous pass.
- .LP
- An example of this pattern is depicted in the following
- table.
- The pattern is in the left column and the
- replacement action is in the right column.
- .DS
- .CW
- .ta .1i .6i 1.6i 2.1i 2.6i
- MOVE a->b \fR(remove)\fP
- .R
- (sequence with no mention of \f(CWa\fP)
- .CW
- USE b USE a
- .R
- (sequence with no mention of \f(CWa\fP)
- .CW
- SET b SET b
- .DE
- .LP
- Experiments have shown that it is marginally
- worthwhile to rename uses of the destination variable
- with uses of the source variable up to
- the first use of the source variable.
- .LP
- The second transform will do relabeling
- without deleting instructions.
- When a ``move'' instruction is encountered,
- if the source variable has been set prior
- to the use of the destination variable
- then all of the references to the source
- variable are replaced by the destination and
- the ``move'' is inverted.
- Typically,
- this transformation will alter two ``move''
- instructions and allow the first transformation
- another chance to remove code.
- This transformation uses the forward data flow
- set up in the previous pass.
- .LP
- Again,
- the following is a depiction of the transformation where
- the pattern is in the left column and the
- rewrite is in the right column.
- .DS
- .CW
- .ta .1i .6i 1.6i 2.1i 2.6i
- SET a SET b
- .R
- (sequence with no use of \f(CWb\fP)
- .CW
- USE a USE b
- .R
- (sequence with no use of \f(CWb\fP)
- .CW
- MOVE a->b MOVE b->a
- .DE
- Iterating these transformations
- will usually get rid of all redundant ``move'' instructions.
- .LP
- A problem with this organization is that the costs
- of registerization calculated in the previous pass
- must depend on how well this pass can detect and remove
- redundant instructions.
- Often,
- a fine candidate for registerization is rejected
- because of the cost of instructions that are later
- removed.
- .NH 2
- Writing the object file
- .LP
- The last pass walks the internal assembly language
- and writes the object file.
- The object file is reduced in size by about a factor
- of three with simple compression
- techniques.
- The most important aspect of the object file
- format is that it is independent of the compiling machine.
- All integer and floating numbers in the object
- code are converted to known formats and byte
- orders.
- .NH
- The loader
- .LP
- The loader is a multiple pass program that
- reads object files and libraries and produces
- an executable binary.
- The loader also does some minimal
- optimizations and code rewriting.
- Many of the operations performed by the
- loader are machine-dependent.
- .LP
- The first pass of the loader reads the
- object modules into an internal data
- structure that looks like binary assembly language.
- As the instructions are read,
- code is reordered to remove
- unconditional branch instructions.
- Conditional branch instructions are inverted
- to prevent the insertion of unconditional branches.
- The loader will also make a copy of a few instructions
- to remove an unconditional branch.
- .LP
- The next pass allocates addresses for
- all external data.
- Typical of processors is the MIPS,
- which can reference ±32K bytes from a
- register.
- The loader allocates the register
- .CW R30
- as the static pointer.
- The value placed in
- .CW R30
- is the base of the data segment plus 32K.
- It is then cheap to reference all data in the
- first 64K of the data segment.
- External variables are allocated to
- the data segment
- with the smallest variables allocated first.
- If all of the data cannot fit into the first
- 64K of the data segment,
- then usually only a few large arrays
- need more expensive addressing modes.
- .LP
- For the MIPS processor,
- the loader makes a pass over the internal
- structures,
- exchanging instructions to try
- to fill ``delay slots'' with useful work.
- If a useful instruction cannot be found
- to fill a delay slot,
- the loader will insert
- ``noop''
- instructions.
- This pass is very expensive and does not
- do a good job.
- About 40% of all instructions are in
- delay slots.
- About 65% of these are useful instructions and
- 35% are ``noops.''
- The vendor-supplied assembler does this job
- more effectively,
- filling about 80%
- of the delay slots with useful instructions.
- .LP
- On the 68020 processor,
- branch instructions come in a variety of
- sizes depending on the relative distance
- of the branch.
- Thus the size of branch instructions
- can be mutually dependent.
- The loader uses a multiple pass algorithm
- to resolve the branch lengths
- [Szy78].
- Initially, all branches are assumed minimal length.
- On each subsequent pass,
- the branches are reassessed
- and expanded if necessary.
- When no more expansions occur,
- the locations of the instructions in
- the text segment are known.
- .LP
- On the MIPS processor,
- all instructions are one size.
- A single pass over the instructions will
- determine the locations of all addresses
- in the text segment.
- .LP
- The last pass of the loader produces the
- executable binary.
- A symbol table and other tables are
- produced to help the debugger to
- interpret the binary symbolically.
- .LP
- The loader places absolute source line numbers in the symbol table.
- The name and absolute line number of all
- .CW #include
- files is also placed in the
- symbol table so that the debuggers can
- associate object code to source files.
- .NH
- Performance
- .LP
- The following is a table of the source size of the MIPS
- compiler.
- .DS
- .ta .1i .6i
- lines module
- \0509 machine-independent headers
- 1070 machine-independent YACC source
- 6090 machine-independent C source
- \0545 machine-dependent headers
- 6532 machine-dependent C source
- \0298 loader headers
- 5215 loader C source
- .DE
- .LP
- The following table shows timing
- of a test program
- that plays checkers, running on a MIPS R4000.
- The test program is 26 files totaling 12600 lines of C.
- The execution time does not significantly
- depend on library implementation.
- Since no other compiler runs on Plan 9,
- the Plan 9 tests were done with the Plan 9 operating system;
- the other tests were done on the vendor's operating system.
- The hardware was identical in both cases.
- The optimizer in the vendor's compiler
- is reputed to be extremely good.
- .DS
- .ta .1i .9i
- \0\04.49s Plan 9 \f(CWvc\fP \f(CW-N\fP compile time (opposite of \f(CW-O\fP)
- \0\01.72s Plan 9 \f(CWvc\fP \f(CW-N\fP load time
- 148.69s Plan 9 \f(CWvc\fP \f(CW-N\fP run time
- \015.07s Plan 9 \f(CWvc\fP compile time (\f(CW-O\fP implicit)
- \0\01.66s Plan 9 \f(CWvc\fP load time
- \089.96s Plan 9 \f(CWvc\fP run time
- \014.83s vendor \f(CWcc\fP compile time
- \0\00.38s vendor \f(CWcc\fP load time
- 104.75s vendor \f(CWcc\fP run time
- \043.59s vendor \f(CWcc\fP \f(CW-O\fP compile time
- \0\00.38s vendor \f(CWcc\fP \f(CW-O\fP load time
- \076.19s vendor \f(CWcc\fP \f(CW-O\fP run time
- \0\08.19s vendor \f(CWcc\fP \f(CW-O3\fP compile time
- \035.97s vendor \f(CWcc\fP \f(CW-O3\fP load time
- \071.16s vendor \f(CWcc\fP \f(CW-O3\fP run time
- .DE
- .LP
- To compare the Intel compiler,
- a program that is about 40% bit manipulation and
- about 60% single precision floating point was
- run on the same 33 MHz 486, once under Windows
- compiled with the Watcom compiler, version 10.0,
- in 16-bit mode and once under
- Plan 9 in 32-bit mode.
- The Plan 9 execution time was 27 sec while the Windows
- execution time was 31 sec.
- .NH
- Conclusions
- .LP
- The new compilers compile
- quickly,
- load slowly,
- and produce
- medium quality
- object code.
- The compilers are relatively
- portable,
- requiring but a couple of weeks' work to
- produce a compiler for a different computer.
- For Plan 9,
- where we needed several compilers
- with specialized features and
- our own object formats,
- this project was indispensable.
- It is also necessary for us to
- be able to freely distribute our compilers
- with the Plan 9 distribution.
- .LP
- Two problems have come up in retrospect.
- The first has to do with the
- division of labor between compiler and loader.
- Plan 9 runs on multi-processors and as such
- compilations are often done in parallel.
- Unfortunately,
- all compilations must be complete before loading
- can begin.
- The load is then single-threaded.
- With this model,
- any shift of work from compile to load
- results in a significant increase in real time.
- The same is true of libraries that are compiled
- infrequently and loaded often.
- In the future,
- we may try to put some of the loader work
- back into the compiler.
- .LP
- The second problem comes from
- the various optimizations performed over several
- passes.
- Often optimizations in different passes depend
- on each other.
- Iterating the passes could compromise efficiency,
- or even loop.
- We see no real solution to this problem.
- .NH
- References
- .LP
- [Aho87] A. V. Aho, R. Sethi, and J. D. Ullman,
- .I
- Compilers \- Principles, Techniques, and Tools,
- .R
- Addison Wesley,
- Reading, MA,
- 1987.
- .LP
- [ANSI90] \f2American National Standard for Information Systems \-
- Programming Language C\f1, American National Standards Institute, Inc.,
- New York, 1990.
- .LP
- [Dav91] J. W. Davidson and D. B. Whalley,
- ``Methods for Saving and Restoring Register Values across Function Calls'',
- .I
- Software\-Practice and Experience,
- .R
- Vol 21(2), pp. 149-165, February 1991.
- .LP
- [Joh79] S. C. Johnson,
- ``YACC \- Yet Another Compiler Compiler'',
- .I
- UNIX Programmer's Manual, Seventh Ed., Vol. 2A,
- .R
- AT&T Bell Laboratories,
- Murray Hill, NJ,
- 1979.
- .LP
- [Set70] R. Sethi and J. D. Ullman,
- ``The Generation of Optimal Code for Arithmetic Expressions'',
- .I
- Journal of the ACM,
- .R
- Vol 17(4), pp. 715-728, 1970.
- .LP
- [Szy78] T. G. Szymanski,
- ``Assembling Code for Machines with Span-dependent Instructions'',
- .I
- Communications of the ACM,
- .R
- Vol 21(4), pp. 300-308, 1978.
|