1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117 |
- <html>
- <title>
- data
- </title>
- <body BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#330088" ALINK="#FF0044">
- <H1>Plan 9 C Compilers
- </H1>
- <DL><DD><I>Ken Thompson<br>
- ken@plan9.bell-labs.com<br>
- </I></DL>
- <DL><DD><H4>ABSTRACT</H4>
- <DL>
- <DT><DT> <DD>
- NOTE:<I> Originally appeared, in a different form, in
- Proceedings of the Summer 1990 UKUUG Conference,
- pp. 41-51,
- London, 1990.
- </I><DT> <DD></dl>
- <br>
- This paper describes the overall structure and function of the Plan 9 C compilers.
- A more detailed implementation document
- for any one of the compilers
- is yet to be written.
- </DL>
- <H4>1 Introduction
- </H4>
- <br> <br>
- There are many compilers in the series.
- Six of the compilers (MIPS 3000, SPARC, Intel 386, Power PC, DEC Alpha, and Motorola 68020)
- are considered active and are used to compile
- current versions of Plan 9.
- Several others (Motorola 68000, Intel 960, ARM 7500, AMD 29000) have had only limited use, such as
- to program peripherals or experimental devices.
- <H4>2 Structure
- </H4>
- <br> <br>
- The compiler is a single program that produces an
- object file.
- Combined in the compiler are the traditional
- roles of preprocessor, lexical analyzer, parser, code generator,
- local optimizer,
- and first half of the assembler.
- The object files are binary forms of assembly
- language,
- similar to what might be passed between
- the first and second passes of an assembler.
- <br> <br>
- Object files and libraries
- are combined by a loader
- program to produce the executable binary.
- The loader combines the roles of second half
- of the assembler, global optimizer, and loader.
- The names of the compliers, loaders, and assemblers
- are as follows:
- <DL><DT><DD><TT><PRE>
- SPARC <TT>kc</TT> <TT>kl</TT> <TT>ka</TT>
- Power <TT>PC</TT> <TT>qc</TT> <TT>ql</TT>
- MIPS <TT>vc</TT> <TT>vl</TT> <TT>va</TT>
- Motorola <TT>68000</TT> <TT>1c</TT> <TT>1l</TT>
- Motorola <TT>68020</TT> <TT>2c</TT> <TT>2l</TT>
- ARM <TT>7500</TT> <TT>5c</TT> <TT>5l</TT>
- Intel <TT>960</TT> <TT>6c</TT> <TT>6l</TT>
- DEC <TT>Alpha</TT> <TT>7c</TT> <TT>7l</TT>
- Intel <TT>386</TT> <TT>8c</TT> <TT>8l</TT>
- AMD <TT>29000</TT> <TT>9c</TT> <TT>9l</TT>
- </PRE></TT></DL>
- There is a further breakdown
- in the source of the compilers into
- object-independent and
- object-dependent
- parts.
- All of the object-independent parts
- are combined into source files in the
- directory
- <TT>/sys/src/cmd/cc</TT>.
- The object-dependent parts are collected
- in a separate directory for each compiler,
- for example
- <TT>/sys/src/cmd/vc</TT>.
- All of the code,
- both object-independent and
- object-dependent,
- is machine-independent
- and may be cross-compiled and executed on any
- of the architectures.
- <H4>3 The Language
- </H4>
- <br> <br>
- The compiler implements ANSI C with some
- restrictions and extensions
- [ANSI90].
- Most of the restrictions are due to
- personal preference, while
- most of the extensions were to help in
- the implementation of Plan 9.
- There are other departures from the standard,
- particularly in the libraries,
- that are beyond the scope of this
- paper.
- <H4>3.1 Register, volatile, const
- </H4>
- <br> <br>
- The keyword
- <TT>register</TT>
- is recognized syntactically
- but is semantically ignored.
- Thus taking the address of a
- <TT>register</TT>
- variable is not diagnosed.
- The keyword
- <TT>volatile</TT>
- disables all optimizations, in particular registerization, of the corresponding variable.
- The keyword
- <TT>const</TT>
- generates warnings (if warnings are enabled by the compiler's
- <TT>-w</TT>
- option) of non-constant use of the variable,
- but does not affect the generated code.
- <H4>3.2 The preprocessor
- </H4>
- <br> <br>
- The C preprocessor is probably the
- biggest departure from the ANSI standard.
- <br> <br>
- The preprocessor built into the Plan 9 compilers does not support
- <TT>#if</TT>,
- although it does handle
- <TT>#ifdef</TT>
- and
- <TT>#include</TT>.
- If it is necessary to be more standard,
- the source text can first be run through the separate ANSI C
- preprocessor,
- <TT>cpp</TT>.
- <H4>3.3 Unnamed substructures
- </H4>
- <br> <br>
- The most important and most heavily used of the
- extensions is the declaration of an
- unnamed substructure or subunion.
- For example:
- <DL><DT><DD><TT><PRE>
- typedef
- struct lock
- {
- int locked;
- } Lock;
- typedef
- struct node
- {
- int type;
- union
- {
- double dval;
- float fval;
- long lval;
- };
- Lock;
- } Node;
- Lock* lock;
- Node* node;
- </PRE></TT></DL>
- The declaration of
- <TT>Node</TT>
- has an unnamed substructure of type
- <TT>Lock</TT>
- and an unnamed subunion.
- One use of this feature allows references to elements of the
- subunit to be accessed as if they were in
- the outer structure.
- Thus
- <TT>node->dval</TT>
- and
- <TT>node->locked</TT>
- are legitimate references.
- <br> <br>
- When an outer structure is used
- in a context that is only legal for
- an unnamed substructure,
- the compiler promotes the reference to the
- unnamed substructure.
- This is true for references to structures and
- to references to pointers to structures.
- This happens in assignment statements and
- in argument passing where prototypes have been
- declared.
- Thus, continuing with the example,
- <DL><DT><DD><TT><PRE>
- lock = node;
- </PRE></TT></DL>
- would assign a pointer to the unnamed
- <TT>Lock</TT>
- in
- the
- <TT>Node</TT>
- to the variable
- <TT>lock</TT>.
- Another example,
- <DL><DT><DD><TT><PRE>
- extern void lock(Lock*);
- func(...)
- {
- ...
- lock(node);
- ...
- }
- </PRE></TT></DL>
- will pass a pointer to the
- <TT>Lock</TT>
- substructure.
- <br> <br>
- Finally, in places where context is insufficient to identify the unnamed structure,
- the type name (it must be a
- <TT>typedef</TT>)
- of the unnamed structure can be used as an identifier.
- In our example,
- <TT>&node->Lock</TT>
- gives the address of the anonymous
- <TT>Lock</TT>
- structure.
- <H4>3.4 Structure displays
- </H4>
- <br> <br>
- A structure cast followed by a list of expressions in braces is
- an expression with the type of the structure and elements assigned from
- the corresponding list.
- Structures are now almost first-class citizens of the language.
- It is common to see code like this:
- <DL><DT><DD><TT><PRE>
- r = (Rectangle){point1, (Point){x,y+2}};
- </PRE></TT></DL>
- <H4>3.5 Initialization indexes
- </H4>
- <br> <br>
- In initializers of arrays,
- one may place a constant expression
- in square brackets before an initializer.
- This causes the next initializer to assign
- the indicated element.
- For example:
- <DL><DT><DD><TT><PRE>
- enum errors
- {
- Etoobig,
- Ealarm,
- Egreg
- };
- char* errstrings[] =
- {
- [Ealarm] "Alarm call",
- [Egreg] "Panic: out of mbufs",
- [Etoobig] "Arg list too long",
- };
- </PRE></TT></DL>
- In the same way,
- individual structures members may
- be initialized in any order by preceding the initialization with
- <TT>.tagname</TT>.
- Both forms allow an optional
- <TT>=</TT>,
- to be compatible with a proposed
- extension to ANSI C.
- <H4>3.6 External register
- </H4>
- <br> <br>
- The declaration
- <TT>extern</TT>
- <TT>register</TT>
- will dedicate a register to
- a variable on a global basis.
- It can be used only under special circumstances.
- External register variables must be identically
- declared in all modules and
- libraries.
- The feature is not intended for efficiency,
- although it can produce efficient code;
- rather it represents a unique storage class that
- would be hard to get any other way.
- On a shared-memory multi-processor,
- an external register is
- one-per-processor and neither one-per-procedure (automatic)
- or one-per-system (external).
- It is used for two variables in the Plan 9 kernel,
- <TT>u</TT>
- and
- <TT>m</TT>.
- <TT>U</TT>
- is a pointer to the structure representing the currently running process
- and
- <TT>m</TT>
- is a pointer to the per-machine data structure.
- <H4>3.7 Long long
- </H4>
- <br> <br>
- The compilers accept
- <TT>long</TT>
- <TT>long</TT>
- as a basic type meaning 64-bit integer.
- On all of the machines
- this type is synthesized from 32-bit instructions.
- <H4>3.8 Pragma
- </H4>
- <br> <br>
- The compilers accept
- <TT>#pragma</TT>
- <TT>lib</TT>
- <I>libname</I>
- and pass the
- library name string uninterpreted
- to the loader.
- The loader uses the library name to
- find libraries to load.
- If the name contains
- <TT>%O</TT>,
- it is replaced with
- the single character object type of the compiler
- (e.g.,
- <TT>v</TT>
- for the MIPS).
- If the name contains
- <TT>%M</TT>,
- it is replaced with
- the architecture type for the compiler
- (e.g.,
- <TT>mips</TT>
- for the MIPS).
- If the name starts with
- <TT>/</TT>
- it is an absolute pathname;
- if it starts with
- <TT>.</TT>
- then it is searched for in the loader's current directory.
- Otherwise, the name is searched from
- <TT>/%M/lib</TT>.
- Such
- <TT>#pragma</TT>
- statements in header files guarantee that the correct
- libraries are always linked with a program without the
- need to specify them explicitly at link time.
- <br> <br>
- They also accept
- <TT>#pragma</TT>
- <TT>hjdicks</TT>
- <TT>on</TT>
- (or
- <TT>yes</TT>
- or
- <TT>1</TT>)
- to cause subsequently declared data, until
- <TT>#pragma</TT>
- <TT>hjdicks</TT>
- <TT>off</TT>
- (or
- <TT>no</TT>
- or
- <TT>0</TT>),
- to be laid out in memory tightly packed in successive bytes, disregarding
- the usual alignment rules.
- Accessing such data can cause faults.
- <br> <br>
- Similarly,
- <TT>#pragma</TT>
- <TT>profile</TT>
- <TT>off</TT>
- (or
- <TT>no</TT>
- or
- <TT>0</TT>)
- causes subsequently declared functions, until
- <TT>#pragma</TT>
- <TT>profile</TT>
- <TT>on</TT>
- (or
- <TT>yes</TT>
- or
- <TT>1</TT>),
- to be marked as unprofiled.
- Such functions will not be profiled when
- profiling is enabled for the rest of the program.
- <br> <br>
- Two
- <TT>#pragma</TT>
- statements allow type-checking of
- <TT>print</TT>-like
- functions.
- The first, of the form
- <DL><DT><DD><TT><PRE>
- #pragma varargck argpos error 2
- </PRE></TT></DL>
- tells the compiler that the second argument to
- <TT>error</TT>
- is a
- <TT>print</TT>
- format string (see the manual page
- <A href="/magic/man2html/2/print"><I>print</I>(2))
- </A>that specifies how to format
- <TT>error</TT>'s
- subsequent arguments.
- The second, of the form
- <DL><DT><DD><TT><PRE>
- #pragma varargck type "s" char*
- </PRE></TT></DL>
- says that the
- <TT>print</TT>
- format verb
- <TT>s</TT>
- processes an argument of
- type
- <TT>char*</TT>.
- If the compiler's
- <TT>-F</TT>
- option is enabled, the compiler will use this information
- to report type violations in the arguments to
- <TT>print</TT>,
- <TT>error</TT>,
- and similar routines.
- <H4>4 Object module conventions
- </H4>
- <br> <br>
- The overall conventions of the runtime environment
- are important
- to runtime efficiency.
- In this section,
- several of these conventions are discussed.
- <H4>4.1 Register saving
- </H4>
- <br> <br>
- In the Plan 9 compilers,
- the caller of a procedure saves the registers.
- With caller-saves,
- the leaf procedures can use all the
- registers and never save them.
- If you spend a lot of time at the leaves,
- this seems preferable.
- With callee-saves,
- the saving of the registers is done
- in the single point of entry and return.
- If you are interested in space,
- this seems preferable.
- In both,
- there is a degree of uncertainty
- about what registers need to be saved.
- Callee-saved registers make it difficult to
- find variables in registers in debuggers.
- Callee-saved registers also complicate
- the implementation of
- <TT>longjmp</TT>.
- The convincing argument is
- that with caller-saves,
- the decision to registerize a variable
- can include the cost of saving the register
- across calls.
- For a further discussion of caller- vs. callee-saves,
- see the paper by Davidson and Whalley [Dav91].
- <br> <br>
- In the Plan 9 operating system,
- calls to the kernel look like normal procedure
- calls, which means
- the caller
- has saved the registers and the system
- entry does not have to.
- This makes system calls considerably faster.
- Since this is a potential security hole,
- and can lead to non-determinism,
- the system may eventually save the registers
- on entry,
- or more likely clear the registers on return.
- <H4>4.2 Calling convention
- </H4>
- <br> <br>
- Older C compilers maintain a frame pointer, which is at a known constant
- offset from the stack pointer within each function.
- For machines where the stack grows towards zero,
- the argument pointer is at a known constant offset
- from the frame pointer.
- Since the stack grows down in Plan 9,
- the Plan 9 compilers
- keep neither an
- explicit frame pointer nor
- an explicit argument pointer;
- instead they generate addresses relative to the stack pointer.
- <br> <br>
- On some architectures, the first argument to a subroutine is passed in a register.
- <H4>4.3 Functions returning structures
- </H4>
- <br> <br>
- Structures longer than one word are awkward to implement
- since they do not fit in registers and must
- be passed around in memory.
- Functions that return structures
- are particularly clumsy.
- The Plan 9 compilers pass the return address of
- a structure as the first argument of a
- function that has a structure return value.
- Thus
- <DL><DT><DD><TT><PRE>
- x = f(...)
- </PRE></TT></DL>
- is rewritten as
- <DL><DT><DD><TT><PRE>
- f(&x, ...).
- </PRE></TT></DL>
- This saves a copy and makes the compilation
- much less clumsy.
- A disadvantage is that if you call this
- function without an assignment,
- a dummy location must be invented.
- <br> <br>
- There is also a danger of calling a function
- that returns a structure without declaring
- it as such.
- With ANSI C function prototypes,
- this error need never occur.
- <H4>5 Implementation
- </H4>
- <br> <br>
- The compiler is divided internally into
- four machine-independent passes,
- four machine-dependent passes,
- and an output pass.
- The next nine sections describe each pass in order.
- <H4>5.1 Parsing
- </H4>
- <br> <br>
- The first pass is a YACC-based parser
- [Joh79].
- Declarations are interpreted immediately,
- building a block structured symbol table.
- Executable statements are put into a parse tree
- and collected,
- without interpretation.
- At the end of each procedure,
- the parse tree for the function is
- examined by the other passes of the compiler.
- <br> <br>
- The input stream of the parser is
- a pushdown list of input activations.
- The preprocessor
- expansions of
- macros
- and
- <TT>#include</TT>
- are implemented as pushdowns.
- Thus there is no separate
- pass for preprocessing.
- <H4>5.2 Typing
- </H4>
- <br> <br>
- The next pass distributes typing information
- to every node of the tree.
- Implicit operations on the tree are added,
- such as type promotions and taking the
- address of arrays and functions.
- <H4>5.3 Machine-independent optimization
- </H4>
- <br> <br>
- The next pass performs optimizations
- and transformations of the tree, such as converting
- <TT>&*x</TT>
- and
- <TT>*&x</TT>
- into
- <TT>x</TT>.
- Constant expressions are converted to constants in this pass.
- <H4>5.4 Arithmetic rewrites
- </H4>
- <br> <br>
- This is another machine-independent optimization.
- Subtrees of add, subtract, and multiply of integers are
- rewritten for easier compilation.
- The major transformation is factoring:
- <TT>4+8*a+16*b+5</TT>
- is transformed into
- <TT>9+8*(a+2*b)</TT>.
- Such expressions arise from address
- manipulation and array indexing.
- <H4>5.5 Addressability
- </H4>
- <br> <br>
- This is the first of the machine-dependent passes.
- The addressability of a processor is defined as the set of
- expressions that is legal in the address field
- of a machine language instruction.
- The addressability of different processors varies widely.
- At one end of the spectrum are the 68020 and VAX,
- which allow a complex mix of incrementing,
- decrementing,
- indexing, and relative addressing.
- At the other end is the MIPS,
- which allows only registers and constant offsets from the
- contents of a register.
- The addressability can be different for different instructions
- within the same processor.
- <br> <br>
- It is important to the code generator to know when a
- subtree represents an address of a particular type.
- This is done with a bottom-up walk of the tree.
- In this pass, the leaves are labeled with small integers.
- When an internal node is encountered,
- it is labeled by consulting a table indexed by the
- labels on the left and right subtrees.
- For example,
- on the 68020 processor,
- it is possible to address an
- offset from a named location.
- In C, this is represented by the expression
- <TT>*(&name+constant)</TT>.
- This is marked addressable by the following table.
- In the table,
- a node represented by the left column is marked
- with a small integer from the right column.
- Marks of the form
- <TT>A<small><small><sub>i</sub></small></small></TT>
- are addressable while
- marks of the form
- <TT>N<small><small><sub>i</sub></small></small></TT>
- are not addressable.
- <DL><DT><DD><TT><PRE>
- Node Marked
- name A<small><small><sub>1</sub></small></small>
- const A<small><small><sub>2</sub></small></small>
- &A<small><small><sub>1</sub></small></small> A<small><small><sub>3</sub></small></small>
- A<small><small><sub>3</sub></small></small>+A<small><small><sub>1</sub></small></small> N<small><small><sub>1</sub></small></small> (note that this is not addressable)
- *N<small><small><sub>1</sub></small></small> A<small><small><sub>4</sub></small></small>
- </PRE></TT></DL>
- Here there is a distinction between
- a node marked
- <TT>A<small><small><sub>1</sub></small></small></TT>
- and a node marked
- <TT>A<small><small><sub>4</sub></small></small></TT>
- because the address operator of an
- <TT>A<small><small><sub>4</sub></small></small></TT>
- node is not addressable.
- So to extend the table:
- <DL><DT><DD><TT><PRE>
- Node Marked
- &A<small><small><sub>4</sub></small></small> N<small><small><sub>2</sub></small></small>
- N<small><small><sub>2</sub></small></small>+N<small><small><sub>1</sub></small></small> N<small><small><sub>1</sub></small></small>
- </PRE></TT></DL>
- The full addressability of the 68020 is expressed
- in 18 rules like this,
- while the addressability of the MIPS is expressed
- in 11 rules.
- When one ports the compiler,
- this table is usually initialized
- so that leaves are labeled as addressable and nothing else.
- The code produced is poor,
- but porting is easy.
- The table can be extended later.
- <br> <br>
- This pass also rewrites some complex operators
- into procedure calls.
- Examples include 64-bit multiply and divide.
- <br> <br>
- In the same bottom-up pass of the tree,
- the nodes are labeled with a Sethi-Ullman complexity
- [Set70].
- This number is roughly the number of registers required
- to compile the tree on an ideal machine.
- An addressable node is marked 0.
- A function call is marked infinite.
- A unary operator is marked as the
- maximum of 1 and the mark of its subtree.
- A binary operator with equal marks on its subtrees is
- marked with a subtree mark plus 1.
- A binary operator with unequal marks on its subtrees is
- marked with the maximum mark of its subtrees.
- The actual values of the marks are not too important,
- but the relative values are.
- The goal is to compile the harder
- (larger mark)
- subtree first.
- <H4>5.6 Code generation
- </H4>
- <br> <br>
- Code is generated by recursive
- descent.
- The Sethi-Ullman complexity completely guides the
- order.
- The addressability defines the leaves.
- The only difficult part is compiling a tree
- that has two infinite (function call)
- subtrees.
- In this case,
- one subtree is compiled into the return register
- (usually the most convenient place for a function call)
- and then stored on the stack.
- The other subtree is compiled into the return register
- and then the operation is compiled with
- operands from the stack and the return register.
- <br> <br>
- There is a separate boolean code generator that compiles
- conditional expressions.
- This is fundamentally different from compiling an arithmetic expression.
- The result of the boolean code generator is the
- position of the program counter and not an expression.
- The boolean code generator makes extensive use of De Morgan's rule.
- The boolean code generator is an expanded version of that described
- in chapter 8 of Aho, Sethi, and Ullman
- [Aho87].
- <br> <br>
- There is a considerable amount of talk in the literature
- about automating this part of a compiler with a machine
- description.
- Since this code generator is so small
- (less than 500 lines of C)
- and easy,
- it hardly seems worth the effort.
- <H4>5.7 Registerization
- </H4>
- <br> <br>
- Up to now,
- the compiler has operated on syntax trees
- that are roughly equivalent to the original source language.
- The previous pass has produced machine language in an internal
- format.
- The next two passes operate on the internal machine language
- structures.
- The purpose of the next pass is to reintroduce
- registers for heavily used variables.
- <br> <br>
- All of the variables that can be
- potentially registerized within a procedure are
- placed in a table.
- (Suitable variables are any automatic or external
- scalars that do not have their addresses extracted.
- Some constants that are hard to reference are also
- considered for registerization.)
- Four separate data flow equations are evaluated
- over the procedure on all of these variables.
- Two of the equations are the normal set-behind
- and used-ahead
- bits that define the life of a variable.
- The two new bits tell if a variable life
- crosses a function call ahead or behind.
- By examining a variable over its lifetime,
- it is possible to get a cost
- for registerizing.
- Loops are detected and the costs are multiplied
- by three for every level of loop nesting.
- Costs are sorted and the variables
- are replaced by available registers on a greedy basis.
- <br> <br>
- The 68020 has two different
- types of registers.
- For the 68020,
- two different costs are calculated for
- each variable life and the register type that
- affords the better cost is used.
- Ties are broken by counting the number of available
- registers of each type.
- <br> <br>
- Note that externals are registerized together with automatics.
- This is done by evaluating the semantics of a ``call'' instruction
- differently for externals and automatics.
- Since a call goes outside the local procedure,
- it is assumed that a call references all externals.
- Similarly,
- externals are assumed to be set before an ``entry'' instruction
- and assumed to be referenced after a ``return'' instruction.
- This makes sure that externals are in memory across calls.
- <br> <br>
- The overall results are satisfactory.
- It would be nice to be able to do this processing in
- a machine-independent way,
- but it is impossible to get all of the costs and
- side effects of different choices by examining the parse tree.
- <br> <br>
- Most of the code in the registerization pass is machine-independent.
- The major machine-dependency is in
- examining a machine instruction to ask if it sets or references
- a variable.
- <H4>5.8 Machine code optimization
- </H4>
- <br> <br>
- The next pass walks the machine code
- for opportunistic optimizations.
- For the most part,
- this is highly specific to a particular
- processor.
- One optimization that is performed
- on all of the processors is the
- removal of unnecessary ``move''
- instructions.
- Ironically,
- most of these instructions were inserted by
- the previous pass.
- There are two patterns that are repetitively
- matched and replaced until no more matches are
- found.
- The first tries to remove ``move'' instructions
- by relabeling variables.
- <br> <br>
- When a ``move'' instruction is encountered,
- if the destination variable is set before the
- source variable is referenced,
- then all of the references to the destination
- variable can be renamed to the source and the ``move''
- can be deleted.
- This transformation uses the reverse data flow
- set up in the previous pass.
- <br> <br>
- An example of this pattern is depicted in the following
- table.
- The pattern is in the left column and the
- replacement action is in the right column.
- <DL><DT><DD><TT><PRE>
- MOVE a->b (remove)
- (sequence with no mention of <TT>a</TT>)
- USE b USE a
- (sequence with no mention of <TT>a</TT>)
- SET b SET b
- </PRE></TT></DL>
- <br> <br>
- Experiments have shown that it is marginally
- worthwhile to rename uses of the destination variable
- with uses of the source variable up to
- the first use of the source variable.
- <br> <br>
- The second transform will do relabeling
- without deleting instructions.
- When a ``move'' instruction is encountered,
- if the source variable has been set prior
- to the use of the destination variable
- then all of the references to the source
- variable are replaced by the destination and
- the ``move'' is inverted.
- Typically,
- this transformation will alter two ``move''
- instructions and allow the first transformation
- another chance to remove code.
- This transformation uses the forward data flow
- set up in the previous pass.
- <br> <br>
- Again,
- the following is a depiction of the transformation where
- the pattern is in the left column and the
- rewrite is in the right column.
- <DL><DT><DD><TT><PRE>
- SET a SET b
- (sequence with no use of <TT>b</TT>)
- USE a USE b
- (sequence with no use of <TT>b</TT>)
- MOVE a->b MOVE b->a
- </PRE></TT></DL>
- Iterating these transformations
- will usually get rid of all redundant ``move'' instructions.
- <br> <br>
- A problem with this organization is that the costs
- of registerization calculated in the previous pass
- must depend on how well this pass can detect and remove
- redundant instructions.
- Often,
- a fine candidate for registerization is rejected
- because of the cost of instructions that are later
- removed.
- <H4>5.9 Writing the object file
- </H4>
- <br> <br>
- The last pass walks the internal assembly language
- and writes the object file.
- The object file is reduced in size by about a factor
- of three with simple compression
- techniques.
- The most important aspect of the object file
- format is that it is independent of the compiling machine.
- All integer and floating numbers in the object
- code are converted to known formats and byte
- orders.
- <H4>6 The loader
- </H4>
- <br> <br>
- The loader is a multiple pass program that
- reads object files and libraries and produces
- an executable binary.
- The loader also does some minimal
- optimizations and code rewriting.
- Many of the operations performed by the
- loader are machine-dependent.
- <br> <br>
- The first pass of the loader reads the
- object modules into an internal data
- structure that looks like binary assembly language.
- As the instructions are read,
- code is reordered to remove
- unconditional branch instructions.
- Conditional branch instructions are inverted
- to prevent the insertion of unconditional branches.
- The loader will also make a copy of a few instructions
- to remove an unconditional branch.
- <br> <br>
- The next pass allocates addresses for
- all external data.
- Typical of processors is the MIPS,
- which can reference ±32K bytes from a
- register.
- The loader allocates the register
- <TT>R30</TT>
- as the static pointer.
- The value placed in
- <TT>R30</TT>
- is the base of the data segment plus 32K.
- It is then cheap to reference all data in the
- first 64K of the data segment.
- External variables are allocated to
- the data segment
- with the smallest variables allocated first.
- If all of the data cannot fit into the first
- 64K of the data segment,
- then usually only a few large arrays
- need more expensive addressing modes.
- <br> <br>
- For the MIPS processor,
- the loader makes a pass over the internal
- structures,
- exchanging instructions to try
- to fill ``delay slots'' with useful work.
- If a useful instruction cannot be found
- to fill a delay slot,
- the loader will insert
- ``noop''
- instructions.
- This pass is very expensive and does not
- do a good job.
- About 40% of all instructions are in
- delay slots.
- About 65% of these are useful instructions and
- 35% are ``noops.''
- The vendor-supplied assembler does this job
- more effectively,
- filling about 80%
- of the delay slots with useful instructions.
- <br> <br>
- On the 68020 processor,
- branch instructions come in a variety of
- sizes depending on the relative distance
- of the branch.
- Thus the size of branch instructions
- can be mutually dependent.
- The loader uses a multiple pass algorithm
- to resolve the branch lengths
- [Szy78].
- Initially, all branches are assumed minimal length.
- On each subsequent pass,
- the branches are reassessed
- and expanded if necessary.
- When no more expansions occur,
- the locations of the instructions in
- the text segment are known.
- <br> <br>
- On the MIPS processor,
- all instructions are one size.
- A single pass over the instructions will
- determine the locations of all addresses
- in the text segment.
- <br> <br>
- The last pass of the loader produces the
- executable binary.
- A symbol table and other tables are
- produced to help the debugger to
- interpret the binary symbolically.
- <br> <br>
- The loader places absolute source line numbers in the symbol table.
- The name and absolute line number of all
- <TT>#include</TT>
- files is also placed in the
- symbol table so that the debuggers can
- associate object code to source files.
- <H4>7 Performance
- </H4>
- <br> <br>
- The following is a table of the source size of the MIPS
- compiler.
- <DL><DT><DD><TT><PRE>
- lines module
- 509 machine-independent headers
- 1070 machine-independent YACC source
- 6090 machine-independent C source
- 545 machine-dependent headers
- 6532 machine-dependent C source
- 298 loader headers
- 5215 loader C source
- </PRE></TT></DL>
- <br> <br>
- The following table shows timing
- of a test program
- that plays checkers, running on a MIPS R4000.
- The test program is 26 files totaling 12600 lines of C.
- The execution time does not significantly
- depend on library implementation.
- Since no other compiler runs on Plan 9,
- the Plan 9 tests were done with the Plan 9 operating system;
- the other tests were done on the vendor's operating system.
- The hardware was identical in both cases.
- The optimizer in the vendor's compiler
- is reputed to be extremely good.
- <DL><DT><DD><TT><PRE>
- 4.49s Plan 9 <TT>vc</TT> <TT>-N</TT> compile time (opposite of <TT>-O</TT>)
- 1.72s Plan 9 <TT>vc</TT> <TT>-N</TT> load time
- 148.69s Plan 9 <TT>vc</TT> <TT>-N</TT> run time
- 15.07s Plan 9 <TT>vc</TT> compile time (<TT>-O</TT> implicit)
- 1.66s Plan 9 <TT>vc</TT> load time
- 89.96s Plan 9 <TT>vc</TT> run time
- 14.83s vendor <TT>cc</TT> compile time
- 0.38s vendor <TT>cc</TT> load time
- 104.75s vendor <TT>cc</TT> run time
- 43.59s vendor <TT>cc</TT> <TT>-O</TT> compile time
- 0.38s vendor <TT>cc</TT> <TT>-O</TT> load time
- 76.19s vendor <TT>cc</TT> <TT>-O</TT> run time
- 8.19s vendor <TT>cc</TT> <TT>-O3</TT> compile time
- 35.97s vendor <TT>cc</TT> <TT>-O3</TT> load time
- 71.16s vendor <TT>cc</TT> <TT>-O3</TT> run time
- </PRE></TT></DL>
- <br> <br>
- To compare the Intel compiler,
- a program that is about 40% bit manipulation and
- about 60% single precision floating point was
- run on the same 33 MHz 486, once under Windows
- compiled with the Watcom compiler, version 10.0,
- in 16-bit mode and once under
- Plan 9 in 32-bit mode.
- The Plan 9 execution time was 27 sec while the Windows
- execution time was 31 sec.
- <H4>8 Conclusions
- </H4>
- <br> <br>
- The new compilers compile
- quickly,
- load slowly,
- and produce
- medium quality
- object code.
- The compilers are relatively
- portable,
- requiring but a couple of weeks' work to
- produce a compiler for a different computer.
- For Plan 9,
- where we needed several compilers
- with specialized features and
- our own object formats,
- this project was indispensable.
- It is also necessary for us to
- be able to freely distribute our compilers
- with the Plan 9 distribution.
- <br> <br>
- Two problems have come up in retrospect.
- The first has to do with the
- division of labor between compiler and loader.
- Plan 9 runs on multi-processors and as such
- compilations are often done in parallel.
- Unfortunately,
- all compilations must be complete before loading
- can begin.
- The load is then single-threaded.
- With this model,
- any shift of work from compile to load
- results in a significant increase in real time.
- The same is true of libraries that are compiled
- infrequently and loaded often.
- In the future,
- we may try to put some of the loader work
- back into the compiler.
- <br> <br>
- The second problem comes from
- the various optimizations performed over several
- passes.
- Often optimizations in different passes depend
- on each other.
- Iterating the passes could compromise efficiency,
- or even loop.
- We see no real solution to this problem.
- <H4>9 References
- </H4>
- <br> <br>
- [Aho87] A. V. Aho, R. Sethi, and J. D. Ullman,
- Compilers - Principles, Techniques, and Tools,
- Addison Wesley,
- Reading, MA,
- 1987.
- <br> <br>
- [ANSI90] <I>American National Standard for Information Systems -
- Programming Language C</I>, American National Standards Institute, Inc.,
- New York, 1990.
- <br> <br>
- [Dav91] J. W. Davidson and D. B. Whalley,
- ``Methods for Saving and Restoring Register Values across Function Calls'',
- Software-Practice and Experience,
- Vol 21(2), pp. 149-165, February 1991.
- <br> <br>
- [Joh79] S. C. Johnson,
- ``YACC - Yet Another Compiler Compiler'',
- UNIX Programmer's Manual, Seventh Ed., Vol. 2A,
- AT&T Bell Laboratories,
- Murray Hill, NJ,
- 1979.
- <br> <br>
- [Set70] R. Sethi and J. D. Ullman,
- ``The Generation of Optimal Code for Arithmetic Expressions'',
- Journal of the ACM,
- Vol 17(4), pp. 715-728, 1970.
- <br> <br>
- [Szy78] T. G. Szymanski,
- ``Assembling Code for Machines with Span-dependent Instructions'',
- Communications of the ACM,
- Vol 21(4), pp. 300-308, 1978.
- <br> <br>
- <A href=http://www.lucent.com/copyright.html>
- Copyright</A> © 2004 Lucent Technologies Inc. All rights reserved.
- </body></html>
|