compiler.html 32 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117
  1. <html>
  2. <title>
  3. data
  4. </title>
  5. <body BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#330088" ALINK="#FF0044">
  6. <H1>Plan 9 C Compilers
  7. </H1>
  8. <DL><DD><I>Ken Thompson<br>
  9. ken@plan9.bell-labs.com<br>
  10. </I></DL>
  11. <DL><DD><H4>ABSTRACT</H4>
  12. <DL>
  13. <DT><DT>&#32;<DD>
  14. NOTE:<I> Originally appeared, in a different form, in
  15. Proceedings of the Summer 1990 UKUUG Conference,
  16. pp. 41-51,
  17. London, 1990.
  18. </I><DT>&#32;<DD></dl>
  19. <br>
  20. This paper describes the overall structure and function of the Plan 9 C compilers.
  21. A more detailed implementation document
  22. for any one of the compilers
  23. is yet to be written.
  24. </DL>
  25. <H4>1 Introduction
  26. </H4>
  27. <br>&#32;<br>
  28. There are many compilers in the series.
  29. Six of the compilers (MIPS 3000, SPARC, Intel 386, Power PC, DEC Alpha, and Motorola 68020)
  30. are considered active and are used to compile
  31. current versions of Plan 9.
  32. Several others (Motorola 68000, Intel 960, ARM 7500, AMD 29000) have had only limited use, such as
  33. to program peripherals or experimental devices.
  34. <H4>2 Structure
  35. </H4>
  36. <br>&#32;<br>
  37. The compiler is a single program that produces an
  38. object file.
  39. Combined in the compiler are the traditional
  40. roles of preprocessor, lexical analyzer, parser, code generator,
  41. local optimizer,
  42. and first half of the assembler.
  43. The object files are binary forms of assembly
  44. language,
  45. similar to what might be passed between
  46. the first and second passes of an assembler.
  47. <br>&#32;<br>
  48. Object files and libraries
  49. are combined by a loader
  50. program to produce the executable binary.
  51. The loader combines the roles of second half
  52. of the assembler, global optimizer, and loader.
  53. The names of the compliers, loaders, and assemblers
  54. are as follows:
  55. <DL><DT><DD><TT><PRE>
  56. SPARC <TT>kc</TT> <TT>kl</TT> <TT>ka</TT>
  57. Power <TT>PC</TT> <TT>qc</TT> <TT>ql</TT>
  58. MIPS <TT>vc</TT> <TT>vl</TT> <TT>va</TT>
  59. Motorola <TT>68000</TT> <TT>1c</TT> <TT>1l</TT>
  60. Motorola <TT>68020</TT> <TT>2c</TT> <TT>2l</TT>
  61. ARM <TT>7500</TT> <TT>5c</TT> <TT>5l</TT>
  62. Intel <TT>960</TT> <TT>6c</TT> <TT>6l</TT>
  63. DEC <TT>Alpha</TT> <TT>7c</TT> <TT>7l</TT>
  64. Intel <TT>386</TT> <TT>8c</TT> <TT>8l</TT>
  65. AMD <TT>29000</TT> <TT>9c</TT> <TT>9l</TT>
  66. </PRE></TT></DL>
  67. There is a further breakdown
  68. in the source of the compilers into
  69. object-independent and
  70. object-dependent
  71. parts.
  72. All of the object-independent parts
  73. are combined into source files in the
  74. directory
  75. <TT>/sys/src/cmd/cc</TT>.
  76. The object-dependent parts are collected
  77. in a separate directory for each compiler,
  78. for example
  79. <TT>/sys/src/cmd/vc</TT>.
  80. All of the code,
  81. both object-independent and
  82. object-dependent,
  83. is machine-independent
  84. and may be cross-compiled and executed on any
  85. of the architectures.
  86. <H4>3 The Language
  87. </H4>
  88. <br>&#32;<br>
  89. The compiler implements ANSI C with some
  90. restrictions and extensions
  91. [ANSI90].
  92. Most of the restrictions are due to
  93. personal preference, while
  94. most of the extensions were to help in
  95. the implementation of Plan 9.
  96. There are other departures from the standard,
  97. particularly in the libraries,
  98. that are beyond the scope of this
  99. paper.
  100. <H4>3.1 Register, volatile, const
  101. </H4>
  102. <br>&#32;<br>
  103. The keyword
  104. <TT>register</TT>
  105. is recognized syntactically
  106. but is semantically ignored.
  107. Thus taking the address of a
  108. <TT>register</TT>
  109. variable is not diagnosed.
  110. The keyword
  111. <TT>volatile</TT>
  112. disables all optimizations, in particular registerization, of the corresponding variable.
  113. The keyword
  114. <TT>const</TT>
  115. generates warnings (if warnings are enabled by the compiler's
  116. <TT>-w</TT>
  117. option) of non-constant use of the variable,
  118. but does not affect the generated code.
  119. <H4>3.2 The preprocessor
  120. </H4>
  121. <br>&#32;<br>
  122. The C preprocessor is probably the
  123. biggest departure from the ANSI standard.
  124. <br>&#32;<br>
  125. The preprocessor built into the Plan 9 compilers does not support
  126. <TT>#if</TT>,
  127. although it does handle
  128. <TT>#ifdef</TT>
  129. and
  130. <TT>#include</TT>.
  131. If it is necessary to be more standard,
  132. the source text can first be run through the separate ANSI C
  133. preprocessor,
  134. <TT>cpp</TT>.
  135. <H4>3.3 Unnamed substructures
  136. </H4>
  137. <br>&#32;<br>
  138. The most important and most heavily used of the
  139. extensions is the declaration of an
  140. unnamed substructure or subunion.
  141. For example:
  142. <DL><DT><DD><TT><PRE>
  143. typedef
  144. struct lock
  145. {
  146. int locked;
  147. } Lock;
  148. typedef
  149. struct node
  150. {
  151. int type;
  152. union
  153. {
  154. double dval;
  155. float fval;
  156. long lval;
  157. };
  158. Lock;
  159. } Node;
  160. Lock* lock;
  161. Node* node;
  162. </PRE></TT></DL>
  163. The declaration of
  164. <TT>Node</TT>
  165. has an unnamed substructure of type
  166. <TT>Lock</TT>
  167. and an unnamed subunion.
  168. One use of this feature allows references to elements of the
  169. subunit to be accessed as if they were in
  170. the outer structure.
  171. Thus
  172. <TT>node-&gt;dval</TT>
  173. and
  174. <TT>node-&gt;locked</TT>
  175. are legitimate references.
  176. <br>&#32;<br>
  177. When an outer structure is used
  178. in a context that is only legal for
  179. an unnamed substructure,
  180. the compiler promotes the reference to the
  181. unnamed substructure.
  182. This is true for references to structures and
  183. to references to pointers to structures.
  184. This happens in assignment statements and
  185. in argument passing where prototypes have been
  186. declared.
  187. Thus, continuing with the example,
  188. <DL><DT><DD><TT><PRE>
  189. lock = node;
  190. </PRE></TT></DL>
  191. would assign a pointer to the unnamed
  192. <TT>Lock</TT>
  193. in
  194. the
  195. <TT>Node</TT>
  196. to the variable
  197. <TT>lock</TT>.
  198. Another example,
  199. <DL><DT><DD><TT><PRE>
  200. extern void lock(Lock*);
  201. func(...)
  202. {
  203. ...
  204. lock(node);
  205. ...
  206. }
  207. </PRE></TT></DL>
  208. will pass a pointer to the
  209. <TT>Lock</TT>
  210. substructure.
  211. <br>&#32;<br>
  212. Finally, in places where context is insufficient to identify the unnamed structure,
  213. the type name (it must be a
  214. <TT>typedef</TT>)
  215. of the unnamed structure can be used as an identifier.
  216. In our example,
  217. <TT>&node-&gt;Lock</TT>
  218. gives the address of the anonymous
  219. <TT>Lock</TT>
  220. structure.
  221. <H4>3.4 Structure displays
  222. </H4>
  223. <br>&#32;<br>
  224. A structure cast followed by a list of expressions in braces is
  225. an expression with the type of the structure and elements assigned from
  226. the corresponding list.
  227. Structures are now almost first-class citizens of the language.
  228. It is common to see code like this:
  229. <DL><DT><DD><TT><PRE>
  230. r = (Rectangle){point1, (Point){x,y+2}};
  231. </PRE></TT></DL>
  232. <H4>3.5 Initialization indexes
  233. </H4>
  234. <br>&#32;<br>
  235. In initializers of arrays,
  236. one may place a constant expression
  237. in square brackets before an initializer.
  238. This causes the next initializer to assign
  239. the indicated element.
  240. For example:
  241. <DL><DT><DD><TT><PRE>
  242. enum errors
  243. {
  244. Etoobig,
  245. Ealarm,
  246. Egreg
  247. };
  248. char* errstrings[] =
  249. {
  250. [Ealarm] "Alarm call",
  251. [Egreg] "Panic: out of mbufs",
  252. [Etoobig] "Arg list too long",
  253. };
  254. </PRE></TT></DL>
  255. In the same way,
  256. individual structures members may
  257. be initialized in any order by preceding the initialization with
  258. <TT>.tagname</TT>.
  259. Both forms allow an optional
  260. <TT>=</TT>,
  261. to be compatible with a proposed
  262. extension to ANSI C.
  263. <H4>3.6 External register
  264. </H4>
  265. <br>&#32;<br>
  266. The declaration
  267. <TT>extern</TT>
  268. <TT>register</TT>
  269. will dedicate a register to
  270. a variable on a global basis.
  271. It can be used only under special circumstances.
  272. External register variables must be identically
  273. declared in all modules and
  274. libraries.
  275. The feature is not intended for efficiency,
  276. although it can produce efficient code;
  277. rather it represents a unique storage class that
  278. would be hard to get any other way.
  279. On a shared-memory multi-processor,
  280. an external register is
  281. one-per-processor and neither one-per-procedure (automatic)
  282. or one-per-system (external).
  283. It is used for two variables in the Plan 9 kernel,
  284. <TT>u</TT>
  285. and
  286. <TT>m</TT>.
  287. <TT>U</TT>
  288. is a pointer to the structure representing the currently running process
  289. and
  290. <TT>m</TT>
  291. is a pointer to the per-machine data structure.
  292. <H4>3.7 Long long
  293. </H4>
  294. <br>&#32;<br>
  295. The compilers accept
  296. <TT>long</TT>
  297. <TT>long</TT>
  298. as a basic type meaning 64-bit integer.
  299. On all of the machines
  300. this type is synthesized from 32-bit instructions.
  301. <H4>3.8 Pragma
  302. </H4>
  303. <br>&#32;<br>
  304. The compilers accept
  305. <TT>#pragma</TT>
  306. <TT>lib</TT>
  307. <I>libname</I>
  308. and pass the
  309. library name string uninterpreted
  310. to the loader.
  311. The loader uses the library name to
  312. find libraries to load.
  313. If the name contains
  314. <TT>%O</TT>,
  315. it is replaced with
  316. the single character object type of the compiler
  317. (e.g.,
  318. <TT>v</TT>
  319. for the MIPS).
  320. If the name contains
  321. <TT>%M</TT>,
  322. it is replaced with
  323. the architecture type for the compiler
  324. (e.g.,
  325. <TT>mips</TT>
  326. for the MIPS).
  327. If the name starts with
  328. <TT>/</TT>
  329. it is an absolute pathname;
  330. if it starts with
  331. <TT>.</TT>
  332. then it is searched for in the loader's current directory.
  333. Otherwise, the name is searched from
  334. <TT>/%M/lib</TT>.
  335. Such
  336. <TT>#pragma</TT>
  337. statements in header files guarantee that the correct
  338. libraries are always linked with a program without the
  339. need to specify them explicitly at link time.
  340. <br>&#32;<br>
  341. They also accept
  342. <TT>#pragma</TT>
  343. <TT>hjdicks</TT>
  344. <TT>on</TT>
  345. (or
  346. <TT>yes</TT>
  347. or
  348. <TT>1</TT>)
  349. to cause subsequently declared data, until
  350. <TT>#pragma</TT>
  351. <TT>hjdicks</TT>
  352. <TT>off</TT>
  353. (or
  354. <TT>no</TT>
  355. or
  356. <TT>0</TT>),
  357. to be laid out in memory tightly packed in successive bytes, disregarding
  358. the usual alignment rules.
  359. Accessing such data can cause faults.
  360. <br>&#32;<br>
  361. Similarly,
  362. <TT>#pragma</TT>
  363. <TT>profile</TT>
  364. <TT>off</TT>
  365. (or
  366. <TT>no</TT>
  367. or
  368. <TT>0</TT>)
  369. causes subsequently declared functions, until
  370. <TT>#pragma</TT>
  371. <TT>profile</TT>
  372. <TT>on</TT>
  373. (or
  374. <TT>yes</TT>
  375. or
  376. <TT>1</TT>),
  377. to be marked as unprofiled.
  378. Such functions will not be profiled when
  379. profiling is enabled for the rest of the program.
  380. <br>&#32;<br>
  381. Two
  382. <TT>#pragma</TT>
  383. statements allow type-checking of
  384. <TT>print</TT>-like
  385. functions.
  386. The first, of the form
  387. <DL><DT><DD><TT><PRE>
  388. #pragma varargck argpos error 2
  389. </PRE></TT></DL>
  390. tells the compiler that the second argument to
  391. <TT>error</TT>
  392. is a
  393. <TT>print</TT>
  394. format string (see the manual page
  395. <A href="/magic/man2html/2/print"><I>print</I>(2))
  396. </A>that specifies how to format
  397. <TT>error</TT>'s
  398. subsequent arguments.
  399. The second, of the form
  400. <DL><DT><DD><TT><PRE>
  401. #pragma varargck type "s" char*
  402. </PRE></TT></DL>
  403. says that the
  404. <TT>print</TT>
  405. format verb
  406. <TT>s</TT>
  407. processes an argument of
  408. type
  409. <TT>char*</TT>.
  410. If the compiler's
  411. <TT>-F</TT>
  412. option is enabled, the compiler will use this information
  413. to report type violations in the arguments to
  414. <TT>print</TT>,
  415. <TT>error</TT>,
  416. and similar routines.
  417. <H4>4 Object module conventions
  418. </H4>
  419. <br>&#32;<br>
  420. The overall conventions of the runtime environment
  421. are important
  422. to runtime efficiency.
  423. In this section,
  424. several of these conventions are discussed.
  425. <H4>4.1 Register saving
  426. </H4>
  427. <br>&#32;<br>
  428. In the Plan 9 compilers,
  429. the caller of a procedure saves the registers.
  430. With caller-saves,
  431. the leaf procedures can use all the
  432. registers and never save them.
  433. If you spend a lot of time at the leaves,
  434. this seems preferable.
  435. With callee-saves,
  436. the saving of the registers is done
  437. in the single point of entry and return.
  438. If you are interested in space,
  439. this seems preferable.
  440. In both,
  441. there is a degree of uncertainty
  442. about what registers need to be saved.
  443. Callee-saved registers make it difficult to
  444. find variables in registers in debuggers.
  445. Callee-saved registers also complicate
  446. the implementation of
  447. <TT>longjmp</TT>.
  448. The convincing argument is
  449. that with caller-saves,
  450. the decision to registerize a variable
  451. can include the cost of saving the register
  452. across calls.
  453. For a further discussion of caller- vs. callee-saves,
  454. see the paper by Davidson and Whalley [Dav91].
  455. <br>&#32;<br>
  456. In the Plan 9 operating system,
  457. calls to the kernel look like normal procedure
  458. calls, which means
  459. the caller
  460. has saved the registers and the system
  461. entry does not have to.
  462. This makes system calls considerably faster.
  463. Since this is a potential security hole,
  464. and can lead to non-determinism,
  465. the system may eventually save the registers
  466. on entry,
  467. or more likely clear the registers on return.
  468. <H4>4.2 Calling convention
  469. </H4>
  470. <br>&#32;<br>
  471. Older C compilers maintain a frame pointer, which is at a known constant
  472. offset from the stack pointer within each function.
  473. For machines where the stack grows towards zero,
  474. the argument pointer is at a known constant offset
  475. from the frame pointer.
  476. Since the stack grows down in Plan 9,
  477. the Plan 9 compilers
  478. keep neither an
  479. explicit frame pointer nor
  480. an explicit argument pointer;
  481. instead they generate addresses relative to the stack pointer.
  482. <br>&#32;<br>
  483. On some architectures, the first argument to a subroutine is passed in a register.
  484. <H4>4.3 Functions returning structures
  485. </H4>
  486. <br>&#32;<br>
  487. Structures longer than one word are awkward to implement
  488. since they do not fit in registers and must
  489. be passed around in memory.
  490. Functions that return structures
  491. are particularly clumsy.
  492. The Plan 9 compilers pass the return address of
  493. a structure as the first argument of a
  494. function that has a structure return value.
  495. Thus
  496. <DL><DT><DD><TT><PRE>
  497. x = f(...)
  498. </PRE></TT></DL>
  499. is rewritten as
  500. <DL><DT><DD><TT><PRE>
  501. f(&amp;x, ...).
  502. </PRE></TT></DL>
  503. This saves a copy and makes the compilation
  504. much less clumsy.
  505. A disadvantage is that if you call this
  506. function without an assignment,
  507. a dummy location must be invented.
  508. <br>&#32;<br>
  509. There is also a danger of calling a function
  510. that returns a structure without declaring
  511. it as such.
  512. With ANSI C function prototypes,
  513. this error need never occur.
  514. <H4>5 Implementation
  515. </H4>
  516. <br>&#32;<br>
  517. The compiler is divided internally into
  518. four machine-independent passes,
  519. four machine-dependent passes,
  520. and an output pass.
  521. The next nine sections describe each pass in order.
  522. <H4>5.1 Parsing
  523. </H4>
  524. <br>&#32;<br>
  525. The first pass is a YACC-based parser
  526. [Joh79].
  527. Declarations are interpreted immediately,
  528. building a block structured symbol table.
  529. Executable statements are put into a parse tree
  530. and collected,
  531. without interpretation.
  532. At the end of each procedure,
  533. the parse tree for the function is
  534. examined by the other passes of the compiler.
  535. <br>&#32;<br>
  536. The input stream of the parser is
  537. a pushdown list of input activations.
  538. The preprocessor
  539. expansions of
  540. macros
  541. and
  542. <TT>#include</TT>
  543. are implemented as pushdowns.
  544. Thus there is no separate
  545. pass for preprocessing.
  546. <H4>5.2 Typing
  547. </H4>
  548. <br>&#32;<br>
  549. The next pass distributes typing information
  550. to every node of the tree.
  551. Implicit operations on the tree are added,
  552. such as type promotions and taking the
  553. address of arrays and functions.
  554. <H4>5.3 Machine-independent optimization
  555. </H4>
  556. <br>&#32;<br>
  557. The next pass performs optimizations
  558. and transformations of the tree, such as converting
  559. <TT>&*x</TT>
  560. and
  561. <TT>*&x</TT>
  562. into
  563. <TT>x</TT>.
  564. Constant expressions are converted to constants in this pass.
  565. <H4>5.4 Arithmetic rewrites
  566. </H4>
  567. <br>&#32;<br>
  568. This is another machine-independent optimization.
  569. Subtrees of add, subtract, and multiply of integers are
  570. rewritten for easier compilation.
  571. The major transformation is factoring:
  572. <TT>4+8*a+16*b+5</TT>
  573. is transformed into
  574. <TT>9+8*(a+2*b)</TT>.
  575. Such expressions arise from address
  576. manipulation and array indexing.
  577. <H4>5.5 Addressability
  578. </H4>
  579. <br>&#32;<br>
  580. This is the first of the machine-dependent passes.
  581. The addressability of a processor is defined as the set of
  582. expressions that is legal in the address field
  583. of a machine language instruction.
  584. The addressability of different processors varies widely.
  585. At one end of the spectrum are the 68020 and VAX,
  586. which allow a complex mix of incrementing,
  587. decrementing,
  588. indexing, and relative addressing.
  589. At the other end is the MIPS,
  590. which allows only registers and constant offsets from the
  591. contents of a register.
  592. The addressability can be different for different instructions
  593. within the same processor.
  594. <br>&#32;<br>
  595. It is important to the code generator to know when a
  596. subtree represents an address of a particular type.
  597. This is done with a bottom-up walk of the tree.
  598. In this pass, the leaves are labeled with small integers.
  599. When an internal node is encountered,
  600. it is labeled by consulting a table indexed by the
  601. labels on the left and right subtrees.
  602. For example,
  603. on the 68020 processor,
  604. it is possible to address an
  605. offset from a named location.
  606. In C, this is represented by the expression
  607. <TT>*(&name+constant)</TT>.
  608. This is marked addressable by the following table.
  609. In the table,
  610. a node represented by the left column is marked
  611. with a small integer from the right column.
  612. Marks of the form
  613. <TT>A<small><small><sub>i</sub></small></small></TT>
  614. are addressable while
  615. marks of the form
  616. <TT>N<small><small><sub>i</sub></small></small></TT>
  617. are not addressable.
  618. <DL><DT><DD><TT><PRE>
  619. Node Marked
  620. name A<small><small><sub>1</sub></small></small>
  621. const A<small><small><sub>2</sub></small></small>
  622. &amp;A<small><small><sub>1</sub></small></small> A<small><small><sub>3</sub></small></small>
  623. A<small><small><sub>3</sub></small></small>+A<small><small><sub>1</sub></small></small> N<small><small><sub>1</sub></small></small> (note that this is not addressable)
  624. *N<small><small><sub>1</sub></small></small> A<small><small><sub>4</sub></small></small>
  625. </PRE></TT></DL>
  626. Here there is a distinction between
  627. a node marked
  628. <TT>A<small><small><sub>1</sub></small></small></TT>
  629. and a node marked
  630. <TT>A<small><small><sub>4</sub></small></small></TT>
  631. because the address operator of an
  632. <TT>A<small><small><sub>4</sub></small></small></TT>
  633. node is not addressable.
  634. So to extend the table:
  635. <DL><DT><DD><TT><PRE>
  636. Node Marked
  637. &amp;A<small><small><sub>4</sub></small></small> N<small><small><sub>2</sub></small></small>
  638. N<small><small><sub>2</sub></small></small>+N<small><small><sub>1</sub></small></small> N<small><small><sub>1</sub></small></small>
  639. </PRE></TT></DL>
  640. The full addressability of the 68020 is expressed
  641. in 18 rules like this,
  642. while the addressability of the MIPS is expressed
  643. in 11 rules.
  644. When one ports the compiler,
  645. this table is usually initialized
  646. so that leaves are labeled as addressable and nothing else.
  647. The code produced is poor,
  648. but porting is easy.
  649. The table can be extended later.
  650. <br>&#32;<br>
  651. This pass also rewrites some complex operators
  652. into procedure calls.
  653. Examples include 64-bit multiply and divide.
  654. <br>&#32;<br>
  655. In the same bottom-up pass of the tree,
  656. the nodes are labeled with a Sethi-Ullman complexity
  657. [Set70].
  658. This number is roughly the number of registers required
  659. to compile the tree on an ideal machine.
  660. An addressable node is marked 0.
  661. A function call is marked infinite.
  662. A unary operator is marked as the
  663. maximum of 1 and the mark of its subtree.
  664. A binary operator with equal marks on its subtrees is
  665. marked with a subtree mark plus 1.
  666. A binary operator with unequal marks on its subtrees is
  667. marked with the maximum mark of its subtrees.
  668. The actual values of the marks are not too important,
  669. but the relative values are.
  670. The goal is to compile the harder
  671. (larger mark)
  672. subtree first.
  673. <H4>5.6 Code generation
  674. </H4>
  675. <br>&#32;<br>
  676. Code is generated by recursive
  677. descent.
  678. The Sethi-Ullman complexity completely guides the
  679. order.
  680. The addressability defines the leaves.
  681. The only difficult part is compiling a tree
  682. that has two infinite (function call)
  683. subtrees.
  684. In this case,
  685. one subtree is compiled into the return register
  686. (usually the most convenient place for a function call)
  687. and then stored on the stack.
  688. The other subtree is compiled into the return register
  689. and then the operation is compiled with
  690. operands from the stack and the return register.
  691. <br>&#32;<br>
  692. There is a separate boolean code generator that compiles
  693. conditional expressions.
  694. This is fundamentally different from compiling an arithmetic expression.
  695. The result of the boolean code generator is the
  696. position of the program counter and not an expression.
  697. The boolean code generator makes extensive use of De Morgan's rule.
  698. The boolean code generator is an expanded version of that described
  699. in chapter 8 of Aho, Sethi, and Ullman
  700. [Aho87].
  701. <br>&#32;<br>
  702. There is a considerable amount of talk in the literature
  703. about automating this part of a compiler with a machine
  704. description.
  705. Since this code generator is so small
  706. (less than 500 lines of C)
  707. and easy,
  708. it hardly seems worth the effort.
  709. <H4>5.7 Registerization
  710. </H4>
  711. <br>&#32;<br>
  712. Up to now,
  713. the compiler has operated on syntax trees
  714. that are roughly equivalent to the original source language.
  715. The previous pass has produced machine language in an internal
  716. format.
  717. The next two passes operate on the internal machine language
  718. structures.
  719. The purpose of the next pass is to reintroduce
  720. registers for heavily used variables.
  721. <br>&#32;<br>
  722. All of the variables that can be
  723. potentially registerized within a procedure are
  724. placed in a table.
  725. (Suitable variables are any automatic or external
  726. scalars that do not have their addresses extracted.
  727. Some constants that are hard to reference are also
  728. considered for registerization.)
  729. Four separate data flow equations are evaluated
  730. over the procedure on all of these variables.
  731. Two of the equations are the normal set-behind
  732. and used-ahead
  733. bits that define the life of a variable.
  734. The two new bits tell if a variable life
  735. crosses a function call ahead or behind.
  736. By examining a variable over its lifetime,
  737. it is possible to get a cost
  738. for registerizing.
  739. Loops are detected and the costs are multiplied
  740. by three for every level of loop nesting.
  741. Costs are sorted and the variables
  742. are replaced by available registers on a greedy basis.
  743. <br>&#32;<br>
  744. The 68020 has two different
  745. types of registers.
  746. For the 68020,
  747. two different costs are calculated for
  748. each variable life and the register type that
  749. affords the better cost is used.
  750. Ties are broken by counting the number of available
  751. registers of each type.
  752. <br>&#32;<br>
  753. Note that externals are registerized together with automatics.
  754. This is done by evaluating the semantics of a ``call'' instruction
  755. differently for externals and automatics.
  756. Since a call goes outside the local procedure,
  757. it is assumed that a call references all externals.
  758. Similarly,
  759. externals are assumed to be set before an ``entry'' instruction
  760. and assumed to be referenced after a ``return'' instruction.
  761. This makes sure that externals are in memory across calls.
  762. <br>&#32;<br>
  763. The overall results are satisfactory.
  764. It would be nice to be able to do this processing in
  765. a machine-independent way,
  766. but it is impossible to get all of the costs and
  767. side effects of different choices by examining the parse tree.
  768. <br>&#32;<br>
  769. Most of the code in the registerization pass is machine-independent.
  770. The major machine-dependency is in
  771. examining a machine instruction to ask if it sets or references
  772. a variable.
  773. <H4>5.8 Machine code optimization
  774. </H4>
  775. <br>&#32;<br>
  776. The next pass walks the machine code
  777. for opportunistic optimizations.
  778. For the most part,
  779. this is highly specific to a particular
  780. processor.
  781. One optimization that is performed
  782. on all of the processors is the
  783. removal of unnecessary ``move''
  784. instructions.
  785. Ironically,
  786. most of these instructions were inserted by
  787. the previous pass.
  788. There are two patterns that are repetitively
  789. matched and replaced until no more matches are
  790. found.
  791. The first tries to remove ``move'' instructions
  792. by relabeling variables.
  793. <br>&#32;<br>
  794. When a ``move'' instruction is encountered,
  795. if the destination variable is set before the
  796. source variable is referenced,
  797. then all of the references to the destination
  798. variable can be renamed to the source and the ``move''
  799. can be deleted.
  800. This transformation uses the reverse data flow
  801. set up in the previous pass.
  802. <br>&#32;<br>
  803. An example of this pattern is depicted in the following
  804. table.
  805. The pattern is in the left column and the
  806. replacement action is in the right column.
  807. <DL><DT><DD><TT><PRE>
  808. MOVE a-&gt;b (remove)
  809. (sequence with no mention of <TT>a</TT>)
  810. USE b USE a
  811. (sequence with no mention of <TT>a</TT>)
  812. SET b SET b
  813. </PRE></TT></DL>
  814. <br>&#32;<br>
  815. Experiments have shown that it is marginally
  816. worthwhile to rename uses of the destination variable
  817. with uses of the source variable up to
  818. the first use of the source variable.
  819. <br>&#32;<br>
  820. The second transform will do relabeling
  821. without deleting instructions.
  822. When a ``move'' instruction is encountered,
  823. if the source variable has been set prior
  824. to the use of the destination variable
  825. then all of the references to the source
  826. variable are replaced by the destination and
  827. the ``move'' is inverted.
  828. Typically,
  829. this transformation will alter two ``move''
  830. instructions and allow the first transformation
  831. another chance to remove code.
  832. This transformation uses the forward data flow
  833. set up in the previous pass.
  834. <br>&#32;<br>
  835. Again,
  836. the following is a depiction of the transformation where
  837. the pattern is in the left column and the
  838. rewrite is in the right column.
  839. <DL><DT><DD><TT><PRE>
  840. SET a SET b
  841. (sequence with no use of <TT>b</TT>)
  842. USE a USE b
  843. (sequence with no use of <TT>b</TT>)
  844. MOVE a-&gt;b MOVE b-&gt;a
  845. </PRE></TT></DL>
  846. Iterating these transformations
  847. will usually get rid of all redundant ``move'' instructions.
  848. <br>&#32;<br>
  849. A problem with this organization is that the costs
  850. of registerization calculated in the previous pass
  851. must depend on how well this pass can detect and remove
  852. redundant instructions.
  853. Often,
  854. a fine candidate for registerization is rejected
  855. because of the cost of instructions that are later
  856. removed.
  857. <H4>5.9 Writing the object file
  858. </H4>
  859. <br>&#32;<br>
  860. The last pass walks the internal assembly language
  861. and writes the object file.
  862. The object file is reduced in size by about a factor
  863. of three with simple compression
  864. techniques.
  865. The most important aspect of the object file
  866. format is that it is independent of the compiling machine.
  867. All integer and floating numbers in the object
  868. code are converted to known formats and byte
  869. orders.
  870. <H4>6 The loader
  871. </H4>
  872. <br>&#32;<br>
  873. The loader is a multiple pass program that
  874. reads object files and libraries and produces
  875. an executable binary.
  876. The loader also does some minimal
  877. optimizations and code rewriting.
  878. Many of the operations performed by the
  879. loader are machine-dependent.
  880. <br>&#32;<br>
  881. The first pass of the loader reads the
  882. object modules into an internal data
  883. structure that looks like binary assembly language.
  884. As the instructions are read,
  885. code is reordered to remove
  886. unconditional branch instructions.
  887. Conditional branch instructions are inverted
  888. to prevent the insertion of unconditional branches.
  889. The loader will also make a copy of a few instructions
  890. to remove an unconditional branch.
  891. <br>&#32;<br>
  892. The next pass allocates addresses for
  893. all external data.
  894. Typical of processors is the MIPS,
  895. which can reference &#177;32K bytes from a
  896. register.
  897. The loader allocates the register
  898. <TT>R30</TT>
  899. as the static pointer.
  900. The value placed in
  901. <TT>R30</TT>
  902. is the base of the data segment plus 32K.
  903. It is then cheap to reference all data in the
  904. first 64K of the data segment.
  905. External variables are allocated to
  906. the data segment
  907. with the smallest variables allocated first.
  908. If all of the data cannot fit into the first
  909. 64K of the data segment,
  910. then usually only a few large arrays
  911. need more expensive addressing modes.
  912. <br>&#32;<br>
  913. For the MIPS processor,
  914. the loader makes a pass over the internal
  915. structures,
  916. exchanging instructions to try
  917. to fill ``delay slots'' with useful work.
  918. If a useful instruction cannot be found
  919. to fill a delay slot,
  920. the loader will insert
  921. ``noop''
  922. instructions.
  923. This pass is very expensive and does not
  924. do a good job.
  925. About 40% of all instructions are in
  926. delay slots.
  927. About 65% of these are useful instructions and
  928. 35% are ``noops.''
  929. The vendor-supplied assembler does this job
  930. more effectively,
  931. filling about 80%
  932. of the delay slots with useful instructions.
  933. <br>&#32;<br>
  934. On the 68020 processor,
  935. branch instructions come in a variety of
  936. sizes depending on the relative distance
  937. of the branch.
  938. Thus the size of branch instructions
  939. can be mutually dependent.
  940. The loader uses a multiple pass algorithm
  941. to resolve the branch lengths
  942. [Szy78].
  943. Initially, all branches are assumed minimal length.
  944. On each subsequent pass,
  945. the branches are reassessed
  946. and expanded if necessary.
  947. When no more expansions occur,
  948. the locations of the instructions in
  949. the text segment are known.
  950. <br>&#32;<br>
  951. On the MIPS processor,
  952. all instructions are one size.
  953. A single pass over the instructions will
  954. determine the locations of all addresses
  955. in the text segment.
  956. <br>&#32;<br>
  957. The last pass of the loader produces the
  958. executable binary.
  959. A symbol table and other tables are
  960. produced to help the debugger to
  961. interpret the binary symbolically.
  962. <br>&#32;<br>
  963. The loader places absolute source line numbers in the symbol table.
  964. The name and absolute line number of all
  965. <TT>#include</TT>
  966. files is also placed in the
  967. symbol table so that the debuggers can
  968. associate object code to source files.
  969. <H4>7 Performance
  970. </H4>
  971. <br>&#32;<br>
  972. The following is a table of the source size of the MIPS
  973. compiler.
  974. <DL><DT><DD><TT><PRE>
  975. lines module
  976. 509 machine-independent headers
  977. 1070 machine-independent YACC source
  978. 6090 machine-independent C source
  979. 545 machine-dependent headers
  980. 6532 machine-dependent C source
  981. 298 loader headers
  982. 5215 loader C source
  983. </PRE></TT></DL>
  984. <br>&#32;<br>
  985. The following table shows timing
  986. of a test program
  987. that plays checkers, running on a MIPS R4000.
  988. The test program is 26 files totaling 12600 lines of C.
  989. The execution time does not significantly
  990. depend on library implementation.
  991. Since no other compiler runs on Plan 9,
  992. the Plan 9 tests were done with the Plan 9 operating system;
  993. the other tests were done on the vendor's operating system.
  994. The hardware was identical in both cases.
  995. The optimizer in the vendor's compiler
  996. is reputed to be extremely good.
  997. <DL><DT><DD><TT><PRE>
  998. 4.49s Plan 9 <TT>vc</TT> <TT>-N</TT> compile time (opposite of <TT>-O</TT>)
  999. 1.72s Plan 9 <TT>vc</TT> <TT>-N</TT> load time
  1000. 148.69s Plan 9 <TT>vc</TT> <TT>-N</TT> run time
  1001. 15.07s Plan 9 <TT>vc</TT> compile time (<TT>-O</TT> implicit)
  1002. 1.66s Plan 9 <TT>vc</TT> load time
  1003. 89.96s Plan 9 <TT>vc</TT> run time
  1004. 14.83s vendor <TT>cc</TT> compile time
  1005. 0.38s vendor <TT>cc</TT> load time
  1006. 104.75s vendor <TT>cc</TT> run time
  1007. 43.59s vendor <TT>cc</TT> <TT>-O</TT> compile time
  1008. 0.38s vendor <TT>cc</TT> <TT>-O</TT> load time
  1009. 76.19s vendor <TT>cc</TT> <TT>-O</TT> run time
  1010. 8.19s vendor <TT>cc</TT> <TT>-O3</TT> compile time
  1011. 35.97s vendor <TT>cc</TT> <TT>-O3</TT> load time
  1012. 71.16s vendor <TT>cc</TT> <TT>-O3</TT> run time
  1013. </PRE></TT></DL>
  1014. <br>&#32;<br>
  1015. To compare the Intel compiler,
  1016. a program that is about 40% bit manipulation and
  1017. about 60% single precision floating point was
  1018. run on the same 33 MHz 486, once under Windows
  1019. compiled with the Watcom compiler, version 10.0,
  1020. in 16-bit mode and once under
  1021. Plan 9 in 32-bit mode.
  1022. The Plan 9 execution time was 27 sec while the Windows
  1023. execution time was 31 sec.
  1024. <H4>8 Conclusions
  1025. </H4>
  1026. <br>&#32;<br>
  1027. The new compilers compile
  1028. quickly,
  1029. load slowly,
  1030. and produce
  1031. medium quality
  1032. object code.
  1033. The compilers are relatively
  1034. portable,
  1035. requiring but a couple of weeks' work to
  1036. produce a compiler for a different computer.
  1037. For Plan 9,
  1038. where we needed several compilers
  1039. with specialized features and
  1040. our own object formats,
  1041. this project was indispensable.
  1042. It is also necessary for us to
  1043. be able to freely distribute our compilers
  1044. with the Plan 9 distribution.
  1045. <br>&#32;<br>
  1046. Two problems have come up in retrospect.
  1047. The first has to do with the
  1048. division of labor between compiler and loader.
  1049. Plan 9 runs on multi-processors and as such
  1050. compilations are often done in parallel.
  1051. Unfortunately,
  1052. all compilations must be complete before loading
  1053. can begin.
  1054. The load is then single-threaded.
  1055. With this model,
  1056. any shift of work from compile to load
  1057. results in a significant increase in real time.
  1058. The same is true of libraries that are compiled
  1059. infrequently and loaded often.
  1060. In the future,
  1061. we may try to put some of the loader work
  1062. back into the compiler.
  1063. <br>&#32;<br>
  1064. The second problem comes from
  1065. the various optimizations performed over several
  1066. passes.
  1067. Often optimizations in different passes depend
  1068. on each other.
  1069. Iterating the passes could compromise efficiency,
  1070. or even loop.
  1071. We see no real solution to this problem.
  1072. <H4>9 References
  1073. </H4>
  1074. <br>&#32;<br>
  1075. [Aho87] A. V. Aho, R. Sethi, and J. D. Ullman,
  1076. Compilers - Principles, Techniques, and Tools,
  1077. Addison Wesley,
  1078. Reading, MA,
  1079. 1987.
  1080. <br>&#32;<br>
  1081. [ANSI90] <I>American National Standard for Information Systems -
  1082. Programming Language C</I>, American National Standards Institute, Inc.,
  1083. New York, 1990.
  1084. <br>&#32;<br>
  1085. [Dav91] J. W. Davidson and D. B. Whalley,
  1086. ``Methods for Saving and Restoring Register Values across Function Calls'',
  1087. Software-Practice and Experience,
  1088. Vol 21(2), pp. 149-165, February 1991.
  1089. <br>&#32;<br>
  1090. [Joh79] S. C. Johnson,
  1091. ``YACC - Yet Another Compiler Compiler'',
  1092. UNIX Programmer's Manual, Seventh Ed., Vol. 2A,
  1093. AT&amp;T Bell Laboratories,
  1094. Murray Hill, NJ,
  1095. 1979.
  1096. <br>&#32;<br>
  1097. [Set70] R. Sethi and J. D. Ullman,
  1098. ``The Generation of Optimal Code for Arithmetic Expressions'',
  1099. Journal of the ACM,
  1100. Vol 17(4), pp. 715-728, 1970.
  1101. <br>&#32;<br>
  1102. [Szy78] T. G. Szymanski,
  1103. ``Assembling Code for Machines with Span-dependent Instructions'',
  1104. Communications of the ACM,
  1105. Vol 21(4), pp. 300-308, 1978.
  1106. <br>&#32;<br>
  1107. <A href=http://www.lucent.com/copyright.html>
  1108. Copyright</A> &#169; 2004 Lucent Technologies Inc. All rights reserved.
  1109. </body></html>