comp.html 41 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320132113221323132413251326132713281329133013311332133313341335133613371338133913401341134213431344134513461347134813491350135113521353135413551356135713581359136013611362136313641365136613671368136913701371137213731374137513761377137813791380138113821383138413851386138713881389139013911392139313941395139613971398139914001401140214031404140514061407140814091410141114121413141414151416141714181419142014211422142314241425142614271428142914301431143214331434143514361437143814391440144114421443144414451446144714481449145014511452145314541455145614571458145914601461146214631464146514661467146814691470147114721473147414751476147714781479148014811482148314841485148614871488148914901491149214931494149514961497149814991500150115021503150415051506150715081509151015111512
  1. <html>
  2. <title>
  3. data
  4. </title>
  5. <body BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#330088" ALINK="#FF0044">
  6. <H1>How to Use the Plan 9 C Compiler
  7. </H1>
  8. <DL><DD><I>Rob Pike<br>
  9. rob@plan9.bell-labs.com<br>
  10. </I></DL>
  11. <H4>Introduction
  12. </H4>
  13. <P>
  14. The C compiler on Plan 9 is a wholly new program; in fact
  15. it was the first piece of software written for what would
  16. eventually become Plan 9 from Bell Labs.
  17. Programmers familiar with existing C compilers will find
  18. a number of differences in both the language the Plan 9 compiler
  19. accepts and in how the compiler is used.
  20. </P>
  21. <P>
  22. The compiler is really a set of compilers, one for each
  23. architecture &#173; MIPS, SPARC, Motorola 68020, Intel 386, etc. &#173;
  24. that accept a dialect of ANSI C and efficiently produce
  25. fairly good code for the target machine.
  26. There is a packaging of the compiler that accepts strict ANSI C for
  27. a POSIX environment, but this document focuses on the
  28. native Plan 9 environment, that in which all the system source and
  29. almost all the utilities are written.
  30. </P>
  31. <H4>Source
  32. </H4>
  33. <P>
  34. The language accepted by the compilers is the core ANSI C language
  35. with some modest extensions,
  36. a greatly simplified preprocessor,
  37. a smaller library that includes system calls and related facilities,
  38. and a completely different structure for include files.
  39. </P>
  40. <P>
  41. Official ANSI C accepts the old (K&amp;R) style of declarations for
  42. functions; the Plan 9 compilers
  43. are more demanding.
  44. Without an explicit run-time flag
  45. (<TT>-B</TT>)
  46. whose use is discouraged, the compilers insist
  47. on new-style function declarations, that is, prototypes for
  48. function arguments.
  49. The function declarations in the libraries' include files are
  50. all in the new style so the interfaces are checked at compile time.
  51. For C programmers who have not yet switched to function prototypes
  52. the clumsy syntax may seem repellent but the payoff in stronger typing
  53. is substantial.
  54. Those who wish to import existing software to Plan 9 are urged
  55. to use the opportunity to update their code.
  56. </P>
  57. <P>
  58. The compilers include an integrated preprocessor that accepts the familiar
  59. <TT>#include</TT>,
  60. <TT>#define</TT>
  61. for macros both with and without arguments,
  62. <TT>#undef</TT>,
  63. <TT>#line</TT>,
  64. <TT>#ifdef</TT>,
  65. <TT>#ifndef</TT>,
  66. and
  67. <TT>#endif</TT>.
  68. It
  69. supports neither
  70. <TT>#if</TT>
  71. nor
  72. <TT>##</TT>,
  73. although it does
  74. honor a few
  75. <TT>#pragmas</TT>.
  76. The
  77. <TT>#if</TT>
  78. directive was omitted because it greatly complicates the
  79. preprocessor, is never necessary, and is usually abused.
  80. Conditional compilation in general makes code hard to understand;
  81. the Plan 9 source uses it sparingly.
  82. Also, because the compilers remove dead code, regular
  83. <TT>if</TT>
  84. statements with constant conditions are more readable equivalents to many
  85. <TT>#ifs</TT>.
  86. To compile imported code ineluctably fouled by
  87. <TT>#if</TT>
  88. there is a separate command,
  89. <TT>/bin/cpp</TT>,
  90. that implements the complete ANSI C preprocessor specification.
  91. </P>
  92. <P>
  93. Include files fall into two groups: machine-dependent and machine-independent.
  94. The machine-independent files occupy the directory
  95. <TT>/sys/include</TT>;
  96. the others are placed in a directory appropriate to the machine, such as
  97. <TT>/mips/include</TT>.
  98. The compiler searches for include files
  99. first in the machine-dependent directory and then
  100. in the machine-independent directory.
  101. At the time of writing there are thirty-one machine-independent include
  102. files and two (per machine) machine-dependent ones:
  103. <TT>&lt;ureg.h&gt;</TT>
  104. and
  105. <TT>&lt;u.h&gt;</TT>.
  106. The first describes the layout of registers on the system stack,
  107. for use by the debugger.
  108. The second defines some
  109. architecture-dependent types such as
  110. <TT>jmp_buf</TT>
  111. for
  112. <TT>setjmp</TT>
  113. and the
  114. <TT>va_arg</TT>
  115. and
  116. <TT>va_list</TT>
  117. macros for handling arguments to variadic functions,
  118. as well as a set of
  119. <TT>typedef</TT>
  120. abbreviations for
  121. <TT>unsigned</TT>
  122. <TT>short</TT>
  123. and so on.
  124. </P>
  125. <P>
  126. Here is an excerpt from
  127. <TT>/68020/include/u.h</TT>:
  128. <DL><DT><DD><TT><PRE>
  129. #define nil ((void*)0)
  130. typedef unsigned short ushort;
  131. typedef unsigned char uchar;
  132. typedef unsigned long ulong;
  133. typedef unsigned int uint;
  134. typedef signed char schar;
  135. typedef long long vlong;
  136. typedef long jmp_buf[2];
  137. #define JMPBUFSP 0
  138. #define JMPBUFPC 1
  139. #define JMPBUFDPC 0
  140. </PRE></TT></DL>
  141. Plan 9 programs use
  142. <TT>nil</TT>
  143. for the name of the zero-valued pointer.
  144. The type
  145. <TT>vlong</TT>
  146. is the largest integer type available; on most architectures it
  147. is a 64-bit value.
  148. A couple of other types in
  149. <TT>&lt;u.h&gt;</TT>
  150. are
  151. <TT>u32int</TT>,
  152. which is guaranteed to have exactly 32 bits (a possibility on all the supported architectures) and
  153. <TT>mpdigit</TT>,
  154. which is used by the multiprecision math package
  155. <TT>&lt;mp.h&gt;</TT>.
  156. The
  157. <TT>#define</TT>
  158. constants permit an architecture-independent (but compiler-dependent)
  159. implementation of stack-switching using
  160. <TT>setjmp</TT>
  161. and
  162. <TT>longjmp</TT>.
  163. </P>
  164. <P>
  165. Every Plan 9 C program begins
  166. <DL><DT><DD><TT><PRE>
  167. #include &lt;u.h&gt;
  168. </PRE></TT></DL>
  169. because all the other installed header files use the
  170. <TT>typedefs</TT>
  171. declared in
  172. <TT>&lt;u.h&gt;</TT>.
  173. </P>
  174. <P>
  175. In strict ANSI C, include files are grouped to collect related functions
  176. in a single file: one for string functions, one for memory functions,
  177. one for I/O, and none for system calls.
  178. Each include file is protected by an
  179. <TT>#ifdef</TT>
  180. to guarantee its contents are seen by the compiler only once.
  181. Plan 9 takes a different approach. Other than a few include
  182. files that define external formats such as archives, the files in
  183. <TT>/sys/include</TT>
  184. correspond to
  185. <I>libraries.</I>
  186. If a program is using a library, it includes the corresponding header.
  187. The default C library comprises string functions, memory functions, and
  188. so on, largely as in ANSI C, some formatted I/O routines,
  189. plus all the system calls and related functions.
  190. To use these functions, one must
  191. <TT>#include</TT>
  192. the file
  193. <TT>&lt;libc.h&gt;</TT>,
  194. which in turn must follow
  195. <TT>&lt;u.h&gt;</TT>,
  196. to define their prototypes for the compiler.
  197. Here is the complete source to the traditional first C program:
  198. <DL><DT><DD><TT><PRE>
  199. #include &lt;u.h&gt;
  200. #include &lt;libc.h&gt;
  201. void
  202. main(void)
  203. {
  204. print("hello world\n");
  205. exits(0);
  206. }
  207. </PRE></TT></DL>
  208. The
  209. <TT>print</TT>
  210. routine and its relatives
  211. <TT>fprint</TT>
  212. and
  213. <TT>sprint</TT>
  214. resemble the similarly-named functions in Standard I/O but are not
  215. attached to a specific I/O library.
  216. In Plan 9
  217. <TT>main</TT>
  218. is not integer-valued; it should call
  219. <TT>exits</TT>,
  220. which takes a string argument (or null; here ANSI C promotes the 0 to a
  221. <TT>char*</TT>).
  222. All these functions are, of course, documented in the Programmer's Manual.
  223. </P>
  224. <P>
  225. To use
  226. <TT>printf</TT>,
  227. <TT>&lt;stdio.h&gt;</TT>
  228. must be included to define the function prototype for
  229. <TT>printf</TT>:
  230. <DL><DT><DD><TT><PRE>
  231. #include &lt;u.h&gt;
  232. #include &lt;libc.h&gt;
  233. #include &lt;stdio.h&gt;
  234. void
  235. main(int argc, char *argv[])
  236. {
  237. printf("%s: hello world; argc = %d\n", argv[0], argc);
  238. exits(0);
  239. }
  240. </PRE></TT></DL>
  241. In practice, Standard I/O is not used much in Plan 9. I/O libraries are
  242. discussed in a later section of this document.
  243. </P>
  244. <P>
  245. There are libraries for handling regular expressions, raster graphics,
  246. windows, and so on, and each has an associated include file.
  247. The manual for each library states which include files are needed.
  248. The files are not protected against multiple inclusion and themselves
  249. contain no nested
  250. <TT>#includes</TT>.
  251. Instead the
  252. programmer is expected to sort out the requirements
  253. and to
  254. <TT>#include</TT>
  255. the necessary files once at the top of each source file. In practice this is
  256. trivial: this way of handling include files is so straightforward
  257. that it is rare for a source file to contain more than half a dozen
  258. <TT>#includes</TT>.
  259. </P>
  260. <P>
  261. The compilers do their own register allocation so the
  262. <TT>register</TT>
  263. keyword is ignored.
  264. For different reasons,
  265. <TT>volatile</TT>
  266. and
  267. <TT>const</TT>
  268. are also ignored.
  269. </P>
  270. <P>
  271. To make it easier to share code with other systems, Plan 9 has a version
  272. of the compiler,
  273. <TT>pcc</TT>,
  274. that provides the standard ANSI C preprocessor, headers, and libraries
  275. with POSIX extensions.
  276. <TT>Pcc</TT>
  277. is recommended only
  278. when broad external portability is mandated. It compiles slower,
  279. produces slower code (it takes extra work to simulate POSIX on Plan 9),
  280. eliminates those parts of the Plan 9 interface
  281. not related to POSIX, and illustrates the clumsiness of an environment
  282. designed by committee.
  283. <TT>Pcc</TT>
  284. is described in more detail in
  285. APE&#173;The ANSI/POSIX Environment,
  286. by Howard Trickey.
  287. </P>
  288. <H4>Process
  289. </H4>
  290. <P>
  291. Each CPU architecture supported by Plan 9 is identified by a single,
  292. arbitrary, alphanumeric character:
  293. <TT>k</TT>
  294. for SPARC,
  295. <TT>q</TT>
  296. for Motorola Power PC 630 and 640,
  297. <TT>v</TT>
  298. for MIPS,
  299. <TT>1</TT>
  300. for Motorola 68000,
  301. <TT>2</TT>
  302. for Motorola 68020 and 68040,
  303. <TT>5</TT>
  304. for Acorn ARM 7500,
  305. <TT>6</TT>
  306. for Intel 960,
  307. <TT>7</TT>
  308. for DEC Alpha,
  309. <TT>8</TT>
  310. for Intel 386, and
  311. <TT>9</TT>
  312. for AMD 29000.
  313. The character labels the support tools and files for that architecture.
  314. For instance, for the 68020 the compiler is
  315. <TT>2c</TT>,
  316. the assembler is
  317. <TT>2a</TT>,
  318. the link editor/loader is
  319. <TT>2l</TT>,
  320. the object files are suffixed
  321. <TT>.2</TT>,
  322. and the default name for an executable file is
  323. <TT>2.out</TT>.
  324. Before we can use the compiler we therefore need to know which
  325. machine we are compiling for.
  326. The next section explains how this decision is made; for the moment
  327. assume we are building 68020 binaries and make the mental substitution for
  328. <TT>2</TT>
  329. appropriate to the machine you are actually using.
  330. </P>
  331. <P>
  332. To convert source to an executable binary is a two-step process.
  333. First run the compiler,
  334. <TT>2c</TT>,
  335. on the source, say
  336. <TT>file.c</TT>,
  337. to generate an object file
  338. <TT>file.2</TT>.
  339. Then run the loader,
  340. <TT>2l</TT>,
  341. to generate an executable
  342. <TT>2.out</TT>
  343. that may be run (on a 680X0 machine):
  344. <DL><DT><DD><TT><PRE>
  345. 2c file.c
  346. 2l file.2
  347. 2.out
  348. </PRE></TT></DL>
  349. The loader automatically links with whatever libraries the program
  350. needs, usually including the standard C library as defined by
  351. <TT>&lt;libc.h&gt;</TT>.
  352. Of course the compiler and loader have lots of options, both familiar and new;
  353. see the manual for details.
  354. The compiler does not generate an executable automatically;
  355. the output of the compiler must be given to the loader.
  356. Since most compilation is done under the control of
  357. <TT>mk</TT>
  358. (see below), this is rarely an inconvenience.
  359. </P>
  360. <P>
  361. The distribution of work between the compiler and loader is unusual.
  362. The compiler integrates preprocessing, parsing, register allocation,
  363. code generation and some assembly.
  364. Combining these tasks in a single program is part of the reason for
  365. the compiler's efficiency.
  366. The loader does instruction selection, branch folding,
  367. instruction scheduling,
  368. and writes the final executable.
  369. There is no separate C preprocessor and no assembler in the usual pipeline.
  370. Instead the intermediate object file
  371. (here a
  372. <TT>.2</TT>
  373. file) is a type of binary assembly language.
  374. The instructions in the intermediate format are not exactly those in
  375. the machine. For example, on the 68020 the object file may specify
  376. a MOVE instruction but the loader will decide just which variant of
  377. the MOVE instruction &#173; MOVE immediate, MOVE quick, MOVE address,
  378. etc. &#173; is most efficient.
  379. </P>
  380. <P>
  381. The assembler,
  382. <TT>2a</TT>,
  383. is just a translator between the textual and binary
  384. representations of the object file format.
  385. It is not an assembler in the traditional sense. It has limited
  386. macro capabilities (the same as the integral C preprocessor in the compiler),
  387. clumsy syntax, and minimal error checking. For instance, the assembler
  388. will accept an instruction (such as memory-to-memory MOVE on the MIPS) that the
  389. machine does not actually support; only when the output of the assembler
  390. is passed to the loader will the error be discovered.
  391. The assembler is intended only for writing things that need access to instructions
  392. invisible from C,
  393. such as the machine-dependent
  394. part of an operating system;
  395. very little code in Plan 9 is in assembly language.
  396. </P>
  397. <P>
  398. The compilers take an option
  399. <TT>-S</TT>
  400. that causes them to print on their standard output the generated code
  401. in a format acceptable as input to the assemblers.
  402. This is of course merely a formatting of the
  403. data in the object file; therefore the assembler is just
  404. an
  405. ASCII-to-binary converter for this format.
  406. Other than the specific instructions, the input to the assemblers
  407. is largely architecture-independent; see
  408. ``A Manual for the Plan 9 Assembler'',
  409. by Rob Pike,
  410. for more information.
  411. </P>
  412. <P>
  413. The loader is an integral part of the compilation process.
  414. Each library header file contains a
  415. <TT>#pragma</TT>
  416. that tells the loader the name of the associated archive; it is
  417. not necessary to tell the loader which libraries a program uses.
  418. The C run-time startup is found, by default, in the C library.
  419. The loader starts with an undefined
  420. symbol,
  421. <TT>_main</TT>,
  422. that is resolved by pulling in the run-time startup code from the library.
  423. (The loader undefines
  424. <TT>_mainp</TT>
  425. when profiling is enabled, to force loading of the profiling start-up
  426. instead.)
  427. </P>
  428. <P>
  429. Unlike its counterpart on other systems, the Plan 9 loader rearranges
  430. data to optimize access. This means the order of variables in the
  431. loaded program is unrelated to its order in the source.
  432. Most programs don't care, but some assume that, for example, the
  433. variables declared by
  434. <DL><DT><DD><TT><PRE>
  435. int a;
  436. int b;
  437. </PRE></TT></DL>
  438. will appear at adjacent addresses in memory. On Plan 9, they won't.
  439. </P>
  440. <H4>Heterogeneity
  441. </H4>
  442. <P>
  443. When the system starts or a user logs in the environment is configured
  444. so the appropriate binaries are available in
  445. <TT>/bin</TT>.
  446. The configuration process is controlled by an environment variable,
  447. <TT></TT><I>cputype</I><TT>,
  448. with value such as
  449. </TT><TT>mips</TT><TT>,
  450. </TT><TT>68020</TT><TT>,
  451. </TT><TT>386</TT><TT>,
  452. or
  453. </TT><TT>sparc</TT><TT>.
  454. For each architecture there is a directory in the root,
  455. with the appropriate name,
  456. that holds the binary and library files for that architecture.
  457. Thus
  458. </TT><TT>/mips/lib</TT><TT>
  459. contains the object code libraries for MIPS programs,
  460. </TT><TT>/mips/include</TT><TT>
  461. holds MIPS-specific include files, and
  462. </TT><TT>/mips/bin</TT><TT>
  463. has the MIPS binaries.
  464. These binaries are attached to
  465. </TT><TT>/bin</TT><TT>
  466. at boot time by binding
  467. </TT><TT>/</TT><TT>cputype/bin</TT><TT>
  468. to
  469. </TT><TT>/bin</TT><TT>,
  470. so
  471. </TT><TT>/bin</TT><TT>
  472. always contains the correct files.
  473. </P>
  474. </TT><P>
  475. The MIPS compiler,
  476. <TT>vc</TT>,
  477. by definition
  478. produces object files for the MIPS architecture,
  479. regardless of the architecture of the machine on which the compiler is running.
  480. There is a version of
  481. <TT>vc</TT>
  482. compiled for each architecture:
  483. <TT>/mips/bin/vc</TT>,
  484. <TT>/68020/bin/vc</TT>,
  485. <TT>/sparc/bin/vc</TT>,
  486. and so on,
  487. each capable of producing MIPS object files regardless of the native
  488. instruction set.
  489. If one is running on a SPARC,
  490. <TT>/sparc/bin/vc</TT>
  491. will compile programs for the MIPS;
  492. if one is running on machine
  493. <TT></TT><I>cputype</I><TT>,
  494. </TT><TT>/</TT><TT>cputype/bin/vc</TT><TT>
  495. will compile programs for the MIPS.
  496. </P>
  497. </TT><P>
  498. Because of the bindings that assemble
  499. <TT>/bin</TT>,
  500. the shell always looks for a command, say
  501. <TT>date</TT>,
  502. in
  503. <TT>/bin</TT>
  504. and automatically finds the file
  505. <TT>/</TT><I>cputype/bin/date</I><TT>.
  506. Therefore the MIPS compiler is known as just
  507. </TT><TT>vc</TT><TT>;
  508. the shell will invoke
  509. </TT><TT>/bin/vc</TT><TT>
  510. and that is guaranteed to be the version of the MIPS compiler
  511. appropriate for the machine running the command.
  512. Regardless of the architecture of the compiling machine,
  513. </TT><TT>/bin/vc</TT><TT>
  514. is
  515. </TT><I>always</I><TT>
  516. the MIPS compiler.
  517. </P>
  518. </TT><P>
  519. Also, the output of
  520. <TT>vc</TT>
  521. and
  522. <TT>vl</TT>
  523. is completely independent of the machine type on which they are executed:
  524. <TT>.v</TT>
  525. files compiled (with
  526. <TT>vc</TT>)
  527. on a SPARC may be linked (with
  528. <TT>vl</TT>)
  529. on a 386.
  530. (The resulting
  531. <TT>v.out</TT>
  532. will run, of course, only on a MIPS.)
  533. Similarly, the MIPS libraries in
  534. <TT>/mips/lib</TT>
  535. are suitable for loading with
  536. <TT>vl</TT>
  537. on any machine; there is only one set of MIPS libraries, not one
  538. set for each architecture that supports the MIPS compiler.
  539. </P>
  540. <H4>Heterogeneity and <TT>mk</TT>
  541. </H4>
  542. <P>
  543. Most software on Plan 9 is compiled under the control of
  544. <TT>mk</TT>,
  545. a descendant of
  546. <TT>make</TT>
  547. that is documented in the Programmer's Manual.
  548. A convention used throughout the
  549. <TT>mkfiles</TT>
  550. makes it easy to compile the source into binary suitable for any architecture.
  551. </P>
  552. <P>
  553. The variable
  554. <TT></TT>cputype<TT>
  555. is advisory: it reports the architecture of the current environment, and should
  556. not be modified. A second variable,
  557. </TT><TT></TT><I>objtype</I><TT>,
  558. is used to set which architecture is being
  559. </TT><I>compiled</I><TT>
  560. for.
  561. The value of
  562. </TT><TT></TT><TT>objtype</TT><TT>
  563. can be used by a
  564. </TT><TT>mkfile</TT><TT>
  565. to configure the compilation environment.
  566. </P>
  567. </TT><P>
  568. In each machine's root directory there is a short
  569. <TT>mkfile</TT>
  570. that defines a set of macros for the compiler, loader, etc.
  571. Here is
  572. <TT>/mips/mkfile</TT>:
  573. <DL><DT><DD><TT><PRE>
  574. &lt;/sys/src/mkfile.proto
  575. CC=vc
  576. LD=vl
  577. O=v
  578. AS=va
  579. </PRE></TT></DL>
  580. The line
  581. <DL><DT><DD><TT><PRE>
  582. &lt;/sys/src/mkfile.proto
  583. </PRE></TT></DL>
  584. causes
  585. <TT>mk</TT>
  586. to include the file
  587. <TT>/sys/src/mkfile.proto</TT>,
  588. which contains general definitions:
  589. <DL><DT><DD><TT><PRE>
  590. #
  591. # common mkfile parameters shared by all architectures
  592. #
  593. OS=v486xq7
  594. CPUS=mips 386 power alpha
  595. CFLAGS=-FVw
  596. LEX=lex
  597. YACC=yacc
  598. MK=/bin/mk
  599. </PRE></TT></DL>
  600. <TT>CC</TT>
  601. is obviously the compiler,
  602. <TT>AS</TT>
  603. the assembler, and
  604. <TT>LD</TT>
  605. the loader.
  606. <TT>O</TT>
  607. is the suffix for the object files and
  608. <TT>CPUS</TT>
  609. and
  610. <TT>OS</TT>
  611. are used in special rules described below.
  612. </P>
  613. <P>
  614. Here is a
  615. <TT>mkfile</TT>
  616. to build the installed source for
  617. <TT>sam</TT>:
  618. <DL><DT><DD><TT><PRE>
  619. &lt;/<I>objtype/mkfile
  620. OBJ=sam.</I>O address.<I>O buffer.</I>O cmd.<I>O disc.</I>O error.<I>O \
  621. file.</I>O io.<I>O list.</I>O mesg.<I>O moveto.</I>O multi.<I>O \
  622. plan9.</I>O rasp.<I>O regexp.</I>O string.<I>O sys.</I>O xec.<I>O
  623. </I>O.out: <I>OBJ
  624. </I>LD <I>OBJ
  625. install: </I>O.out
  626. cp <I>O.out /</I>objtype/bin/sam
  627. installall:
  628. for(objtype in <I>CPUS) mk install
  629. %.</I>O: %.c
  630. <I>CC </I>CFLAGS <I>stem.c
  631. </I>OBJ: sam.h errors.h mesg.h
  632. address.<I>O cmd.</I>O parse.<I>O xec.</I>O unix.<I>O: parse.h
  633. clean:V:
  634. rm -f [</I>OS].out *.[<I>OS] y.tab.?
  635. </PRE></TT></DL>
  636. (The actual
  637. </I><TT>mkfile</TT><I>
  638. imports most of its rules from other secondary files, but
  639. this example works and is not misleading.)
  640. The first line causes
  641. </I><TT>mk</TT><I>
  642. to include the contents of
  643. </I><TT>/</TT><I>objtype/mkfile</I><TT>
  644. in the current
  645. </TT><TT>mkfile</TT><TT>.
  646. If
  647. </TT><TT></TT><I>objtype</I><TT>
  648. is
  649. </TT><TT>mips</TT><TT>,
  650. this inserts the MIPS macro definitions into the
  651. </TT><TT>mkfile</TT><TT>.
  652. In this case the rule for
  653. </TT><TT></TT><TT>O.out</TT><TT>
  654. uses the MIPS tools to build
  655. </TT><TT>v.out</TT><TT>.
  656. The
  657. </TT><TT>%.</TT><I>O</I><TT>
  658. rule in the file uses
  659. </TT><TT>mk</TT><TT>'s
  660. pattern matching facilities to convert the source files to the object
  661. files through the compiler.
  662. (The text of the rules is passed directly to the shell,
  663. </TT><TT>rc</TT><TT>,
  664. without further translation.
  665. See the
  666. </TT><TT>mk</TT><TT>
  667. manual if any of this is unfamiliar.)
  668. Because the default rule builds
  669. </TT><TT></TT><TT>O.out</TT><TT>
  670. rather than
  671. </TT><TT>sam</TT><TT>,
  672. it is possible to maintain binaries for multiple machines in the
  673. same source directory without conflict.
  674. This is also, of course, why the output files from the various
  675. compilers and loaders
  676. have distinct names.
  677. </P>
  678. </TT><P>
  679. The rest of the
  680. <TT>mkfile</TT>
  681. should be easy to follow; notice how the rules for
  682. <TT>clean</TT>
  683. and
  684. <TT>installall</TT>
  685. (that is, install versions for all architectures) use other macros
  686. defined in
  687. <TT>/</TT><I>objtype/mkfile</I><TT>.
  688. In Plan 9,
  689. </TT><TT>mkfiles</TT><TT>
  690. for commands conventionally contain rules to
  691. </TT><TT>install</TT><TT>
  692. (compile and install the version for
  693. </TT><TT></TT><TT>objtype</TT><TT>),
  694. </TT><TT>installall</TT><TT>
  695. (compile and install for all
  696. </TT><TT></TT><I>objtypes</I><TT>),
  697. and
  698. </TT><TT>clean</TT><TT>
  699. (remove all object files, binaries, etc.).
  700. </P>
  701. </TT><P>
  702. The
  703. <TT>mkfile</TT>
  704. is easy to use. To build a MIPS binary,
  705. <TT>v.out</TT>:
  706. <DL><DT><DD><TT><PRE>
  707. % objtype=mips
  708. % mk
  709. </PRE></TT></DL>
  710. To build and install a MIPS binary:
  711. <DL><DT><DD><TT><PRE>
  712. % objtype=mips
  713. % mk install
  714. </PRE></TT></DL>
  715. To build and install all versions:
  716. <DL><DT><DD><TT><PRE>
  717. % mk installall
  718. </PRE></TT></DL>
  719. These conventions make cross-compilation as easy to manage
  720. as traditional native compilation.
  721. Plan 9 programs compile and run without change on machines from
  722. large multiprocessors to laptops. For more information about this process, see
  723. ``Plan 9 Mkfiles'',
  724. by Bob Flandrena.
  725. </P>
  726. <H4>Portability
  727. </H4>
  728. <P>
  729. Within Plan 9, it is painless to write portable programs, programs whose
  730. source is independent of the machine on which they execute.
  731. The operating system is fixed and the compiler, headers and libraries
  732. are constant so most of the stumbling blocks to portability are removed.
  733. Attention to a few details can avoid those that remain.
  734. </P>
  735. <P>
  736. Plan 9 is a heterogeneous environment, so programs must
  737. <I>expect</I>
  738. that external files will be written by programs on machines of different
  739. architectures.
  740. The compilers, for instance, must handle without confusion
  741. object files written by other machines.
  742. The traditional approach to this problem is to pepper the source with
  743. <TT>#ifdefs</TT>
  744. to turn byte-swapping on and off.
  745. Plan 9 takes a different approach: of the handful of machine-dependent
  746. <TT>#ifdefs</TT>
  747. in all the source, almost all are deep in the libraries.
  748. Instead programs read and write files in a defined format,
  749. either (for low volume applications) as formatted text, or
  750. (for high volume applications) as binary in a known byte order.
  751. If the external data were written with the most significant
  752. byte first, the following code reads a 4-byte integer correctly
  753. regardless of the architecture of the executing machine (assuming
  754. an unsigned long holds 4 bytes):
  755. <DL><DT><DD><TT><PRE>
  756. ulong
  757. getlong(void)
  758. {
  759. ulong l;
  760. l = (getchar()&amp;0xFF)&lt;&lt;24;
  761. l |= (getchar()&amp;0xFF)&lt;&lt;16;
  762. l |= (getchar()&amp;0xFF)&lt;&lt;8;
  763. l |= (getchar()&amp;0xFF)&lt;&lt;0;
  764. return l;
  765. }
  766. </PRE></TT></DL>
  767. Note that this code does not `swap' the bytes; instead it just reads
  768. them in the correct order.
  769. Variations of this code will handle any binary format
  770. and also avoid problems
  771. involving how structures are padded, how words are aligned,
  772. and other impediments to portability.
  773. Be aware, though, that extra care is needed to handle floating point data.
  774. </P>
  775. <P>
  776. Efficiency hounds will argue that this method is unnecessarily slow and clumsy
  777. when the executing machine has the same byte order (and padding and alignment)
  778. as the data.
  779. The CPU cost of I/O processing
  780. is rarely the bottleneck for an application, however,
  781. and the gain in simplicity of porting and maintaining the code greatly outweighs
  782. the minor speed loss from handling data in this general way.
  783. This method is how the Plan 9 compilers, the window system, and even the file
  784. servers transmit data between programs.
  785. </P>
  786. <P>
  787. To port programs beyond Plan 9, where the system interface is more variable,
  788. it is probably necessary to use
  789. <TT>pcc</TT>
  790. and hope that the target machine supports ANSI C and POSIX.
  791. </P>
  792. <H4>I/O
  793. </H4>
  794. <P>
  795. The default C library, defined by the include file
  796. <TT>&lt;libc.h&gt;</TT>,
  797. contains no buffered I/O package.
  798. It does have several entry points for printing formatted text:
  799. <TT>print</TT>
  800. outputs text to the standard output,
  801. <TT>fprint</TT>
  802. outputs text to a specified integer file descriptor, and
  803. <TT>sprint</TT>
  804. places text in a character array.
  805. To access library routines for buffered I/O, a program must
  806. explicitly include the header file associated with an appropriate library.
  807. </P>
  808. <P>
  809. The recommended I/O library, used by most Plan 9 utilities, is
  810. <TT>bio</TT>
  811. (buffered I/O), defined by
  812. <TT>&lt;bio.h&gt;</TT>.
  813. There also exists an implementation of ANSI Standard I/O,
  814. <TT>stdio</TT>.
  815. </P>
  816. <P>
  817. <TT>Bio</TT>
  818. is small and efficient, particularly for buffer-at-a-time or
  819. line-at-a-time I/O.
  820. Even for character-at-a-time I/O, however, it is significantly faster than
  821. the Standard I/O library,
  822. <TT>stdio</TT>.
  823. Its interface is compact and regular, although it lacks a few conveniences.
  824. The most noticeable is that one must explicitly define buffers for standard
  825. input and output;
  826. <TT>bio</TT>
  827. does not predefine them. Here is a program to copy input to output a byte
  828. at a time using
  829. <TT>bio</TT>:
  830. <DL><DT><DD><TT><PRE>
  831. #include &lt;u.h&gt;
  832. #include &lt;libc.h&gt;
  833. #include &lt;bio.h&gt;
  834. Biobuf bin;
  835. Biobuf bout;
  836. main(void)
  837. {
  838. int c;
  839. Binit(&amp;bin, 0, OREAD);
  840. Binit(&amp;bout, 1, OWRITE);
  841. while((c=Bgetc(&amp;bin)) != Beof)
  842. Bputc(&amp;bout, c);
  843. exits(0);
  844. }
  845. </PRE></TT></DL>
  846. For peak performance, we could replace
  847. <TT>Bgetc</TT>
  848. and
  849. <TT>Bputc</TT>
  850. by their equivalent in-line macros
  851. <TT>BGETC</TT>
  852. and
  853. <TT>BPUTC</TT>
  854. but
  855. the performance gain would be modest.
  856. For more information on
  857. <TT>bio</TT>,
  858. see the Programmer's Manual.
  859. </P>
  860. <P>
  861. Perhaps the most dramatic difference in the I/O interface of Plan 9 from other
  862. systems' is that text is not ASCII.
  863. The format for
  864. text in Plan 9 is a byte-stream encoding of 16-bit characters.
  865. The character set is based on the Unicode Standard and is backward compatible with
  866. ASCII:
  867. characters with value 0 through 127 are the same in both sets.
  868. The 16-bit characters, called
  869. <I>runes</I>
  870. in Plan 9, are encoded using a representation called
  871. UTF,
  872. an encoding that is becoming accepted as a standard.
  873. (ISO calls it UTF-8;
  874. throughout Plan 9 it's just called
  875. UTF.)
  876. UTF
  877. defines multibyte sequences to
  878. represent character values from 0 to 65535.
  879. In
  880. UTF,
  881. character values up to 127 decimal, 7F hexadecimal, represent themselves,
  882. so straight
  883. ASCII
  884. files are also valid
  885. UTF.
  886. Also,
  887. UTF
  888. guarantees that bytes with values 0 to 127 (NUL to DEL, inclusive)
  889. will appear only when they represent themselves, so programs that read bytes
  890. looking for plain ASCII characters will continue to work.
  891. Any program that expects a one-to-one correspondence between bytes and
  892. characters will, however, need to be modified.
  893. An example is parsing file names.
  894. File names, like all text, are in
  895. UTF,
  896. so it is incorrect to search for a character in a string by
  897. <TT>strchr(filename,</TT>
  898. <TT>c)</TT>
  899. because the character might have a multi-byte encoding.
  900. The correct method is to call
  901. <TT>utfrune(filename,</TT>
  902. <TT>c)</TT>,
  903. defined in
  904. <A href="/magic/man2html/2/rune"><I>rune</I>(2),
  905. </A>which interprets the file name as a sequence of encoded characters
  906. rather than bytes.
  907. In fact, even when you know the character is a single byte
  908. that can represent only itself,
  909. it is safer to use
  910. <TT>utfrune</TT>
  911. because that assumes nothing about the character set
  912. and its representation.
  913. </P>
  914. <P>
  915. The library defines several symbols relevant to the representation of characters.
  916. Any byte with unsigned value less than
  917. <TT>Runesync</TT>
  918. will not appear in any multi-byte encoding of a character.
  919. <TT>Utfrune</TT>
  920. compares the character being searched against
  921. <TT>Runesync</TT>
  922. to see if it is sufficient to call
  923. <TT>strchr</TT>
  924. or if the byte stream must be interpreted.
  925. Any byte with unsigned value less than
  926. <TT>Runeself</TT>
  927. is represented by a single byte with the same value.
  928. Finally, when errors are encountered converting
  929. to runes from a byte stream, the library returns the rune value
  930. <TT>Runeerror</TT>
  931. and advances a single byte. This permits programs to find runes
  932. embedded in binary data.
  933. </P>
  934. <P>
  935. <TT>Bio</TT>
  936. includes routines
  937. <TT>Bgetrune</TT>
  938. and
  939. <TT>Bputrune</TT>
  940. to transform the external byte stream
  941. UTF
  942. format to and from
  943. internal 16-bit runes.
  944. Also, the
  945. <TT>%s</TT>
  946. format to
  947. <TT>print</TT>
  948. accepts
  949. UTF;
  950. <TT>%c</TT>
  951. prints a character after narrowing it to 8 bits.
  952. The
  953. <TT>%S</TT>
  954. format prints a null-terminated sequence of runes;
  955. <TT>%C</TT>
  956. prints a character after narrowing it to 16 bits.
  957. For more information, see the Programmer's Manual, in particular
  958. <A href="/magic/man2html/6/utf"><I>utf</I>(6)
  959. </A>and
  960. <A href="/magic/man2html/2/rune"><I>rune</I>(2),
  961. </A>and the paper,
  962. ``Hello world, or
  963. &#922;&#945;&#955;&#951;&#956;&#941;&#961;&#945; &#954;&#972;&#963;&#956;&#949;, or
  964. &#12371;&#12435;&#12395;&#12385;&#12399; &#19990;&#30028;'',
  965. by Rob Pike and
  966. Ken Thompson;
  967. there is not room for the full story here.
  968. </P>
  969. <P>
  970. These issues affect the compiler in several ways.
  971. First, the C source is in
  972. UTF.
  973. ANSI says C variables are formed from
  974. ASCII
  975. alphanumerics, but comments and literal strings may contain any characters
  976. encoded in the native encoding, here
  977. UTF.
  978. The declaration
  979. <DL><DT><DD><TT><PRE>
  980. char *cp = "abc&yuml;";
  981. </PRE></TT></DL>
  982. initializes the variable
  983. <TT>cp</TT>
  984. to point to an array of bytes holding the
  985. UTF
  986. representation of the characters
  987. <TT>abc&yuml;.</TT>
  988. The type
  989. <TT>Rune</TT>
  990. is defined in
  991. <TT>&lt;u.h&gt;</TT>
  992. to be
  993. <TT>ushort</TT>,
  994. which is also the `wide character' type in the compiler.
  995. Therefore the declaration
  996. <DL><DT><DD><TT><PRE>
  997. Rune *rp = L"abc&yuml;";
  998. </PRE></TT></DL>
  999. initializes the variable
  1000. <TT>rp</TT>
  1001. to point to an array of unsigned short integers holding the 16-bit
  1002. values of the characters
  1003. <TT>abc&yuml;</TT>.
  1004. Note that in both these declarations the characters in the source
  1005. that represent
  1006. <TT>abc&yuml;</TT>
  1007. are the same; what changes is how those characters are represented
  1008. in memory in the program.
  1009. The following two lines:
  1010. <DL><DT><DD><TT><PRE>
  1011. print("%s\n", "abc&yuml;");
  1012. print("%S\n", L"abc&yuml;");
  1013. </PRE></TT></DL>
  1014. produce the same
  1015. UTF
  1016. string on their output, the first by copying the bytes, the second
  1017. by converting from runes to bytes.
  1018. </P>
  1019. <P>
  1020. In C, character constants are integers but narrowed through the
  1021. <TT>char</TT>
  1022. type.
  1023. The Unicode character
  1024. <TT>&yuml;</TT>
  1025. has value 255, so if the
  1026. <TT>char</TT>
  1027. type is signed,
  1028. the constant
  1029. <TT>'&yuml;'</TT>
  1030. has value -1 (which is equal to EOF).
  1031. On the other hand,
  1032. <TT>L'&yuml;'</TT>
  1033. narrows through the wide character type,
  1034. <TT>ushort</TT>,
  1035. and therefore has value 255.
  1036. </P>
  1037. <P>
  1038. Finally, although it's not ANSI C, the Plan 9 C compilers
  1039. assume any character with value above
  1040. <TT>Runeself</TT>
  1041. is an alphanumeric,
  1042. so &#945; is a legal, if non-portable, variable name.
  1043. </P>
  1044. <H4>Arguments
  1045. </H4>
  1046. <P>
  1047. Some macros are defined
  1048. in
  1049. <TT>&lt;libc.h&gt;</TT>
  1050. for parsing the arguments to
  1051. <TT>main()</TT>.
  1052. They are described in
  1053. <A href="/magic/man2html/2/arg"><I>arg</I>(2)
  1054. </A>but are fairly self-explanatory.
  1055. There are four macros:
  1056. <TT>ARGBEGIN</TT>
  1057. and
  1058. <TT>ARGEND</TT>
  1059. are used to bracket a hidden
  1060. <TT>switch</TT>
  1061. statement within which
  1062. <TT>ARGC</TT>
  1063. returns the current option character (rune) being processed and
  1064. <TT>ARGF</TT>
  1065. returns the argument to the option, as in the loader option
  1066. <TT>-o</TT>
  1067. <TT>file</TT>.
  1068. Here, for example, is the code at the beginning of
  1069. <TT>main()</TT>
  1070. in
  1071. <TT>ramfs.c</TT>
  1072. (see
  1073. <A href="/magic/man2html/1/ramfs"><I>ramfs</I>(1))
  1074. </A>that cracks its arguments:
  1075. <DL><DT><DD><TT><PRE>
  1076. void
  1077. main(int argc, char *argv[])
  1078. {
  1079. char *defmnt;
  1080. int p[2];
  1081. int mfd[2];
  1082. int stdio = 0;
  1083. defmnt = "/tmp";
  1084. ARGBEGIN{
  1085. case 'i':
  1086. defmnt = 0;
  1087. stdio = 1;
  1088. mfd[0] = 0;
  1089. mfd[1] = 1;
  1090. break;
  1091. case 's':
  1092. defmnt = 0;
  1093. break;
  1094. case 'm':
  1095. defmnt = ARGF();
  1096. break;
  1097. default:
  1098. usage();
  1099. }ARGEND
  1100. </PRE></TT></DL>
  1101. </P>
  1102. <H4>Extensions
  1103. </H4>
  1104. <P>
  1105. The compiler has several extensions to ANSI C, all of which are used
  1106. extensively in the system source.
  1107. First,
  1108. <I>structure</I>
  1109. <I>displays</I>
  1110. permit
  1111. <TT>struct</TT>
  1112. expressions to be formed dynamically.
  1113. Given these declarations:
  1114. <DL><DT><DD><TT><PRE>
  1115. typedef struct Point Point;
  1116. typedef struct Rectangle Rectangle;
  1117. struct Point
  1118. {
  1119. int x, y;
  1120. };
  1121. struct Rectangle
  1122. {
  1123. Point min, max;
  1124. };
  1125. Point p, q, add(Point, Point);
  1126. Rectangle r;
  1127. int x, y;
  1128. </PRE></TT></DL>
  1129. this assignment may appear anywhere an assignment is legal:
  1130. <DL><DT><DD><TT><PRE>
  1131. r = (Rectangle){add(p, q), (Point){x, y+3}};
  1132. </PRE></TT></DL>
  1133. The syntax is the same as for initializing a structure but with
  1134. a leading cast.
  1135. </P>
  1136. <P>
  1137. If an
  1138. <I>anonymous</I>
  1139. <I>structure</I>
  1140. or
  1141. <I>union</I>
  1142. is declared within another structure or union, the members of the internal
  1143. structure or union are addressable without prefix in the outer structure.
  1144. This feature eliminates the clumsy naming of nested structures and,
  1145. particularly, unions.
  1146. For example, after these declarations,
  1147. <DL><DT><DD><TT><PRE>
  1148. struct Lock
  1149. {
  1150. int locked;
  1151. };
  1152. struct Node
  1153. {
  1154. int type;
  1155. union{
  1156. double dval;
  1157. double fval;
  1158. long lval;
  1159. }; /* anonymous union */
  1160. struct Lock; /* anonymous structure */
  1161. } *node;
  1162. void lock(struct Lock*);
  1163. </PRE></TT></DL>
  1164. one may refer to
  1165. <TT>node-&gt;type</TT>,
  1166. <TT>node-&gt;dval</TT>,
  1167. <TT>node-&gt;fval</TT>,
  1168. <TT>node-&gt;lval</TT>,
  1169. and
  1170. <TT>node-&gt;locked</TT>.
  1171. Moreover, the address of a
  1172. <TT>struct</TT>
  1173. <TT>Node</TT>
  1174. may be used without a cast anywhere that the address of a
  1175. <TT>struct</TT>
  1176. <TT>Lock</TT>
  1177. is used, such as in argument lists.
  1178. The compiler automatically promotes the type and adjusts the address.
  1179. Thus one may invoke
  1180. <TT>lock(node)</TT>.
  1181. </P>
  1182. <P>
  1183. Anonymous structures and unions may be accessed by type name
  1184. if (and only if) they are declared using a
  1185. <TT>typedef</TT>
  1186. name.
  1187. For example, using the above declaration for
  1188. <TT>Point</TT>,
  1189. one may declare
  1190. <DL><DT><DD><TT><PRE>
  1191. struct
  1192. {
  1193. int type;
  1194. Point;
  1195. } p;
  1196. </PRE></TT></DL>
  1197. and refer to
  1198. <TT>p.Point</TT>.
  1199. </P>
  1200. <P>
  1201. In the initialization of arrays, a number in square brackets before an
  1202. element sets the index for the initialization. For example, to initialize
  1203. some elements in
  1204. a table of function pointers indexed by
  1205. ASCII
  1206. character,
  1207. <DL><DT><DD><TT><PRE>
  1208. void percent(void), slash(void);
  1209. void (*func[128])(void) =
  1210. {
  1211. ['%'] percent,
  1212. ['/'] slash,
  1213. };
  1214. </PRE></TT></DL>
  1215. </P>
  1216. <br>&#32;<br>
  1217. A similar syntax allows one to initialize structure elements:
  1218. <DL><DT><DD><TT><PRE>
  1219. Point p =
  1220. {
  1221. .y 100,
  1222. .x 200
  1223. };
  1224. </PRE></TT></DL>
  1225. These initialization syntaxes were later added to ANSI C, with the addition of an
  1226. equals sign between the index or tag and the value.
  1227. The Plan 9 compiler accepts either form.
  1228. <P>
  1229. Finally, the declaration
  1230. <DL><DT><DD><TT><PRE>
  1231. extern register reg;
  1232. </PRE></TT></DL>
  1233. (<I>this</I>
  1234. appearance of the register keyword is not ignored)
  1235. allocates a global register to hold the variable
  1236. <TT>reg</TT>.
  1237. External registers must be used carefully: they need to be declared in
  1238. <I>all</I>
  1239. source files and libraries in the program to guarantee the register
  1240. is not allocated temporarily for other purposes.
  1241. Especially on machines with few registers, such as the i386,
  1242. it is easy to link accidentally with code that has already usurped
  1243. the global registers and there is no diagnostic when this happens.
  1244. Used wisely, though, external registers are powerful.
  1245. The Plan 9 operating system uses them to access per-process and
  1246. per-machine data structures on a multiprocessor. The storage class they provide
  1247. is hard to create in other ways.
  1248. </P>
  1249. <H4>The compile-time environment
  1250. </H4>
  1251. <P>
  1252. The code generated by the compilers is `optimized' by default:
  1253. variables are placed in registers and peephole optimizations are
  1254. performed.
  1255. The compiler flag
  1256. <TT>-N</TT>
  1257. disables these optimizations.
  1258. Registerization is done locally rather than throughout a function:
  1259. whether a variable occupies a register or
  1260. the memory location identified in the symbol
  1261. table depends on the activity of the variable and may change
  1262. throughout the life of the variable.
  1263. The
  1264. <TT>-N</TT>
  1265. flag is rarely needed;
  1266. its main use is to simplify debugging.
  1267. There is no information in the symbol table to identify the
  1268. registerization of a variable, so
  1269. <TT>-N</TT>
  1270. guarantees the variable is always where the symbol table says it is.
  1271. </P>
  1272. <P>
  1273. Another flag,
  1274. <TT>-w</TT>,
  1275. turns
  1276. <I>on</I>
  1277. warnings about portability and problems detected in flow analysis.
  1278. Most code in Plan 9 is compiled with warnings enabled;
  1279. these warnings plus the type checking offered by function prototypes
  1280. provide most of the support of the Unix tool
  1281. <TT>lint</TT>
  1282. more accurately and with less chatter.
  1283. Two of the warnings,
  1284. `used and not set' and `set and not used', are almost always accurate but
  1285. may be triggered spuriously by code with invisible control flow,
  1286. such as in routines that call
  1287. <TT>longjmp</TT>.
  1288. The compiler statements
  1289. <DL><DT><DD><TT><PRE>
  1290. SET(v1);
  1291. USED(v2);
  1292. </PRE></TT></DL>
  1293. decorate the flow graph to silence the compiler.
  1294. Either statement accepts a comma-separated list of variables.
  1295. Use them carefully: they may silence real errors.
  1296. For the common case of unused parameters to a function,
  1297. leaving the name off the declaration silences the warnings.
  1298. That is, listing the type of a parameter but giving it no
  1299. associated variable name does the trick.
  1300. </P>
  1301. <H4>Debugging
  1302. </H4>
  1303. <P>
  1304. There are two debuggers available on Plan 9.
  1305. The first, and older, is
  1306. <TT>db</TT>,
  1307. a revision of Unix
  1308. <TT>adb</TT>.
  1309. The other,
  1310. <TT>acid</TT>,
  1311. is a source-level debugger whose commands are statements in
  1312. a true programming language.
  1313. <TT>Acid</TT>
  1314. is the preferred debugger, but since it
  1315. borrows some elements of
  1316. <TT>db</TT>,
  1317. notably the formats for displaying values, it is worth knowing a little bit about
  1318. <TT>db</TT>.
  1319. </P>
  1320. <P>
  1321. Both debuggers support multiple architectures in a single program; that is,
  1322. the programs are
  1323. <TT>db</TT>
  1324. and
  1325. <TT>acid</TT>,
  1326. not for example
  1327. <TT>vdb</TT>
  1328. and
  1329. <TT>vacid</TT>.
  1330. They also support cross-architecture debugging comfortably:
  1331. one may debug a 68020 binary on a MIPS.
  1332. </P>
  1333. <P>
  1334. Imagine a program has crashed mysteriously:
  1335. <DL><DT><DD><TT><PRE>
  1336. % X11/X
  1337. Fatal server bug!
  1338. failed to create default stipple
  1339. X 106: suicide: sys: trap: fault read addr=0x0 pc=0x00105fb8
  1340. %
  1341. </PRE></TT></DL>
  1342. When a process dies on Plan 9 it hangs in the `broken' state
  1343. for debugging.
  1344. Attach a debugger to the process by naming its process id:
  1345. <DL><DT><DD><TT><PRE>
  1346. % acid 106
  1347. /proc/106/text:mips plan 9 executable
  1348. /sys/lib/acid/port
  1349. /sys/lib/acid/mips
  1350. acid:
  1351. </PRE></TT></DL>
  1352. The
  1353. <TT>acid</TT>
  1354. function
  1355. <TT>stk()</TT>
  1356. reports the stack traceback:
  1357. <DL><DT><DD><TT><PRE>
  1358. acid: stk()
  1359. At pc:0x105fb8:abort+0x24 /sys/src/ape/lib/ap/stdio/abort.c:6
  1360. abort() /sys/src/ape/lib/ap/stdio/abort.c:4
  1361. called from FatalError+#4e
  1362. /sys/src/X/mit/server/dix/misc.c:421
  1363. FatalError(s9=#e02, s8=#4901d200, s7=#2, s6=#72701, s5=#1,
  1364. s4=#7270d, s3=#6, s2=#12, s1=#ff37f1c, s0=#6, f=#7270f)
  1365. /sys/src/X/mit/server/dix/misc.c:416
  1366. called from gnotscreeninit+#4ce
  1367. /sys/src/X/mit/server/ddx/gnot/gnot.c:792
  1368. gnotscreeninit(snum=#0, sc=#80db0)
  1369. /sys/src/X/mit/server/ddx/gnot/gnot.c:766
  1370. called from AddScreen+#16e
  1371. /n/bootes/sys/src/X/mit/server/dix/main.c:610
  1372. AddScreen(pfnInit=0x0000129c,argc=0x00000001,argv=0x7fffffe4)
  1373. /sys/src/X/mit/server/dix/main.c:530
  1374. called from InitOutput+0x80
  1375. /sys/src/X/mit/server/ddx/brazil/brddx.c:522
  1376. InitOutput(argc=0x00000001,argv=0x7fffffe4)
  1377. /sys/src/X/mit/server/ddx/brazil/brddx.c:511
  1378. called from main+0x294
  1379. /sys/src/X/mit/server/dix/main.c:225
  1380. main(argc=0x00000001,argv=0x7fffffe4)
  1381. /sys/src/X/mit/server/dix/main.c:136
  1382. called from _main+0x24
  1383. /sys/src/ape/lib/ap/mips/main9.s:8
  1384. </PRE></TT></DL>
  1385. The function
  1386. <TT>lstk()</TT>
  1387. is similar but
  1388. also reports the values of local variables.
  1389. Note that the traceback includes full file names; this is a boon to debugging,
  1390. although it makes the output much noisier.
  1391. </P>
  1392. <P>
  1393. To use
  1394. <TT>acid</TT>
  1395. well you will need to learn its input language; see the
  1396. ``Acid Manual'',
  1397. by Phil Winterbottom,
  1398. for details. For simple debugging, however, the information in the manual page is
  1399. sufficient. In particular, it describes the most useful functions
  1400. for examining a process.
  1401. </P>
  1402. <P>
  1403. The compiler does not place
  1404. information describing the types of variables in the executable,
  1405. but a compile-time flag provides crude support for symbolic debugging.
  1406. The
  1407. <TT>-a</TT>
  1408. flag to the compiler suppresses code generation
  1409. and instead emits source text in the
  1410. <TT>acid</TT>
  1411. language to format and display data structure types defined in the program.
  1412. The easiest way to use this feature is to put a rule in the
  1413. <TT>mkfile</TT>:
  1414. <DL><DT><DD><TT><PRE>
  1415. syms: main.O
  1416. <I>CC -a main.c &gt; syms
  1417. </PRE></TT></DL>
  1418. Then from within
  1419. </I><TT>acid</TT><I>,
  1420. <DL><DT><DD><TT><PRE>
  1421. acid: include("sourcedirectory/syms")
  1422. </PRE></TT></DL>
  1423. to read in the relevant definitions.
  1424. (For multi-file source, you need to be a little fancier;
  1425. see
  1426. <A href="/magic/man2html/1/2c"></I><I>2c</I><I>(1)).
  1427. </A>This text includes, for each defined compound
  1428. type, a function with that name that may be called with the address of a structure
  1429. of that type to display its contents.
  1430. For example, if
  1431. </I><TT>rect</TT><I>
  1432. is a global variable of type
  1433. </I><TT>Rectangle</TT><I>,
  1434. one may execute
  1435. <DL><DT><DD><TT><PRE>
  1436. Rectangle(*rect)
  1437. </PRE></TT></DL>
  1438. to display it.
  1439. The
  1440. </I><TT>*</TT><I>
  1441. (indirection) operator is necessary because
  1442. of the way
  1443. </I><TT>acid</TT><I>
  1444. works: each global symbol in the program is defined as a variable by
  1445. </I><TT>acid</TT><I>,
  1446. with value equal to the
  1447. </I><I>address</I><I>
  1448. of the symbol.
  1449. </P>
  1450. </I><P>
  1451. Another common technique is to write by hand special
  1452. <TT>acid</TT>
  1453. code to define functions to aid debugging, initialize the debugger, and so on.
  1454. Conventionally, this is placed in a file called
  1455. <TT>acid</TT>
  1456. in the source directory; it has a line
  1457. <DL><DT><DD><TT><PRE>
  1458. include("sourcedirectory/syms");
  1459. </PRE></TT></DL>
  1460. to load the compiler-produced symbols. One may edit the compiler output directly but
  1461. it is wiser to keep the hand-generated
  1462. <TT>acid</TT>
  1463. separate from the machine-generated.
  1464. </P>
  1465. <P>
  1466. To make things simple, the default rules in the system
  1467. <TT>mkfiles</TT>
  1468. include entries to make
  1469. <TT>foo.acid</TT>
  1470. from
  1471. <TT>foo.c</TT>,
  1472. so one may use
  1473. <TT>mk</TT>
  1474. to automate the production of
  1475. <TT>acid</TT>
  1476. definitions for a given C source file.
  1477. </P>
  1478. <P>
  1479. There is much more to say here. See
  1480. <TT>acid</TT>
  1481. manual page, the reference manual, or the paper
  1482. ``Acid: A Debugger Built From A Language'',
  1483. also by Phil Winterbottom.
  1484. </P>
  1485. <br>&#32;<br>
  1486. <A href=http://www.lucent.com/copyright.html>
  1487. Copyright</A> &#169; 2004 Lucent Technologies Inc. All rights reserved.
  1488. </body></html>