real.ms 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611
  1. .FP palatino
  2. .de ]C
  3. \&[\\$1]\\$2
  4. ..
  5. .if \nZ=0 .so real.ref
  6. .EQ
  7. delim $$
  8. .EN
  9. .TL
  10. Real Inferno
  11. .AU
  12. .I "Eric Grosse"
  13. .AI
  14. .I "Lucent Technologies, Bell Labs"
  15. .I "Murray Hill NJ 07974 USA"
  16. .I "ehg@bell-labs.com"
  17. .\"date{19 Aug 1996, minor revisions 7 Jan 1998}
  18. .FS
  19. Previously appeared in R.F. Boisvert (editor),
  20. .ft I
  21. The Quality of Numerical Software: Assessment and
  22. Enhancement: Proceedings of the IFIP TC2/WG2.5
  23. Working Conference on the Quality of Numerical
  24. Software, Oxford,
  25. United Kingdom, 8-12 July 1996,
  26. .ft R
  27. Chapman Hall,
  28. London,
  29. 1997 (pp. 270-279).
  30. .FE
  31. .AB
  32. Inferno is an operating system well suited to applications that need to be
  33. portable, graphical, and networked. This paper describes the fundamental
  34. floating point facilities of the system, including: tight rules on
  35. expression evaluation, binary/decimal conversion, exceptions and rounding,
  36. and the elementary function library.
  37. .AE
  38. .PP
  39. Although the focus of Inferno is interactive media, its portability across
  40. hardware and operating platforms, its relative simplicity, and its strength
  41. in distributed computing make it attractive for advanced scientific
  42. computing as well. Since the appearance of a new operating system is a
  43. relatively uncommon event, this is a special opportunity for numerical
  44. analysts to voice their opinion about what fundamental facilities they need.
  45. The purpose of this short paper is to describe numerical aspects of the
  46. initial release of Inferno, and to invite comment before the tyranny of
  47. backward compatibility makes changes impossible.
  48. .PP
  49. An overview of Inferno is given by Dorward et. al.
  50. .]C "Inferno" ,
  51. but for our immediate purposes it may suffice to say that Inferno plays the
  52. role of a traditional operating system (with compilers, process control,
  53. networking, graphics, and so on) but can run either on bare hardware or on
  54. top of another operating system like Windows95 or Unix. Programs for
  55. .I "Inferno"
  56. are written in the language
  57. .I "Limbo"
  58. and compiled to
  59. machine-independent object files for the
  60. .I "Dis"
  61. virtual
  62. machine, which is then implemented with runtime compilation for best
  63. performance. Files are accessible over networks using the
  64. .I "Styx"
  65. protocol; together with the presentation of most system resources as files
  66. and the manipulation of file namespaces, this permits integration of a
  67. collection of machines into a team. Limbo looks somewhat like a mixture of C
  68. and Pascal, augmented by modules (to cope with the namespace and dynamic
  69. loading needs of large programs) and by a channel facility for convenient
  70. (coarse-grain) parallel programing. Array references are bounds-checked and
  71. memory is garbage collected.
  72. .PP
  73. The rest of this paper covers the fundamental floating point environment
  74. provided by the Limbo compiler and
  75. .I "math"
  76. module, the ``elementary
  77. functions,'' and finally some comments on why particular definitions were
  78. chosen or why certain facilities were included or excluded. This discussion
  79. assumes the reader is familiar with scientific computing in general and the
  80. IEEE floating point standard in particular.
  81. .NH 1
  82. Floating point
  83. .PP
  84. In Limbo, arithmetic on literal and named constants is evaluated at compile
  85. time with all exceptions ignored. Arithmetic on variables is left by the
  86. compiler to runtime, even if data path analysis shows the value to be a
  87. compile time constant. This implies that tools generating Limbo source must
  88. do their own simplification, and not expect the compiler to change $x/x$
  89. into $1$, or $-(y-x)$ into $x-y$, or even $x-0$ into $x$. Negation $-x$
  90. changes the sign of $x$; note that this not the same as $0-x$ if $x=0$.
  91. .PP
  92. The compiler may perform subexpression elimination and other forms of code
  93. motion, but not across calls to the mode and status functions. It respects
  94. parentheses. The evaluation order of $a+b+c$ follows the parse tree and is
  95. therefore the same as for $(a+b)+c$. These are the same rules as for Fortran
  96. and C.
  97. .PP
  98. Contracted multiply-add instructions (with a single rounding) are not
  99. generated by the compiler, though they may be used in the native
  100. .SM BLAS
  101. libraries. All arithmetic follows the IEEE floating point standard
  102. .]C "IEEEfp" ,
  103. except that denormalized numbers may not be supported; see the
  104. discussion in section 3.
  105. .PP
  106. The most important numerical development at the language level recently has
  107. been accurate binary/decimal conversion
  108. .]C "Clinger"
  109. .]C "Gay"
  110. .]C "SteeleWhite" .
  111. Thus printing a real using
  112. .CW "\%g"
  113. and reading
  114. it on a different machine guarantees recovering identical bits. (Limbo uses
  115. the familiar
  116. .I "printf"
  117. syntax of C, but checks argument types against
  118. the format string at compile time, in keeping with its attempt to help the
  119. programmer by stringent type checking.) A good
  120. .I "strtod/dtoa"
  121. is,
  122. unfortunately, 1700 lines of source (15kbytes compiled), though with modest
  123. average runtime penalty. This code must be used in the compiler so that
  124. coefficients are accurately transferred to bytecodes. Smaller, faster, but
  125. sloppier, runtimes will also be provided when mandated by limited memory and
  126. specialized use. However, programmers may assume the features described in
  127. this paper are present in all Inferno systems intended for general computing.
  128. .PP
  129. Each thread has a floating point control word (governing rounding mode and
  130. whether a particular floating point exception causes a trap) and a floating
  131. point status word (storing accumulated exception flags). Functions
  132. .I "FPcontrol"
  133. and
  134. .I "FPstatus"
  135. copy bits to the control or status word, in
  136. positions specified by a mask, returning previous values of the bits.
  137. .I "getFPcontrol"
  138. and
  139. .I "getFPstatus"
  140. return the words unchanged.
  141. .PP
  142. The constants
  143. .I "INVAL, ZDIV, OVFL, UNFL, INEX"
  144. are non-overlapping
  145. single-bit masks used to compose arguments or return values. They stand for
  146. the five IEEE exceptions:
  147. .IP •
  148. ``invalid operation'' ($0/0$,$0 * infinity $,$ infinity - infinity $,$sqrt{-1}$)
  149. .IP •
  150. ``division by zero'' ($1/0$),
  151. .IP •
  152. ``overflow'' ($1.8e308$)
  153. .IP •
  154. ``underflow'' ($1.1e-308$)
  155. .IP •
  156. ``inexact'' ($.3*.3$).
  157. .PP
  158. The constants
  159. .I "RND_NR, RND_NINF, RND_PINF, RND_Z"
  160. are distinct
  161. bit patterns for ``round to nearest even'', ``round toward $-{infinity} $'',
  162. ``round toward $+{infinity} $'', ``round toward $0$'', any of which can be set
  163. or extracted from the floating point control word using
  164. .I "RND_MASK" .
  165. For example,
  166. .IP •
  167. to arrange for the program to tolerate underflow,
  168. .I "FPcontrol(0,UNFL)."
  169. .IP •
  170. to check and clear the inexact flag,
  171. .I "FPstatus(0,INEX)."
  172. .IP •
  173. to set directed rounding,
  174. .I "FPcontrol(RND_PINF,RND_MASK)."
  175. .PP
  176. By default,
  177. .I "INEX"
  178. is quiet and
  179. .I "OVFL, UNFL, ZDIV,"
  180. and
  181. .I "INVAL"
  182. are fatal. By default, rounding is to nearest even, and library
  183. functions are entitled to assume this. Functions that wish to use quiet
  184. overflow, underflow, or zero-divide should either set and restore the
  185. control register themselves or clearly document that the caller must do so.
  186. The ``default'' mentioned here is what a Limbo program gets if started in a
  187. fresh environment. Threads inherit floating point control and status from
  188. their parent at the time of spawning and therefore one can spawn a ``round
  189. toward 0'' shell and re-run a program to effortlessly look for rounding
  190. instabilities in a program.
  191. .NH 1
  192. Elementary functions
  193. .PP
  194. The constants
  195. .I "Infinity, NaN, MachEps, Pi, Degree"
  196. are defined. Since
  197. Inferno has thorough support of Unicode, it was tempting to name these $infinity $, $ε $, $π $, and °, but people (or rather, their
  198. favorite text editing tools) may not be ready yet for non-\s-2ASCII\s0
  199. source text.
  200. .I "Infinity"
  201. and
  202. .I "NaN"
  203. are the positive infinity
  204. and quiet not-a-number of the IEEE standard, double precision.
  205. .I MachEps
  206. is $2 sup {-52}$, the unit in the last place of the mantissa $1.0$.
  207. The value of
  208. .I "Pi"
  209. is the nearest machine number to the
  210. mathematical value $π $.
  211. .I "Degree"
  212. is
  213. $"Pi" / 180$.
  214. .PP
  215. Three useful functions
  216. .I "fdim, fmax, fmin"
  217. are adopted from the
  218. Numerical C extensions
  219. .]C "NumerC" .
  220. The unusual one of these, often
  221. denoted $(x-y) sub {+}$, is defined by $roman "fdim" ( x , y )=x-y$ if $x > y$, else $0$. The compiler may turn these into efficient machine instruction sequences,
  222. possibly even branch-free, rather than function calls. There are two almost
  223. redundant mod functions:
  224. .I "remainder(x,y)"
  225. is as defined by the IEEE
  226. standard (with result $roman "|" r roman "|" <= y/2$);
  227. .I "fmod(x,y)"
  228. is $x roman "mod" y$,
  229. computed in exact arithmetic with $0<= r<y$. Limbo has a ``tuple'' type,
  230. which is the natural return value in the call $(i,f)= roman "modf" ( x )$ to
  231. break $x$ into integer and fractional parts. The function
  232. .I "rint"
  233. rounds to an integer, following the rounding mode specified in the floating
  234. point control word.
  235. .PP
  236. For a good-quality, freely-available elementary function library,
  237. .I "math"
  238. uses the IEEE subset of
  239. .I "fdlibm"
  240. .]C "fdlibm" .
  241. Of course, a
  242. conforming implementation may use entirely different source, but must take
  243. care with accuracy and with special arguments. There are the customary
  244. power, trigonometric, Bessel, and erf functions, and specialized versions $roman "expm1"( x )=e sup x - 1$, $roman "log1p" ( x )=log ( 1 + x )$. An additional function
  245. $roman "pow10"( n ) = 10 sup n$ is defined; in the default implementation this is
  246. just fdlibm's $roman "pow" ( 10. , n )$ but it is provided so that separate
  247. trade-offs of accuracy and simplicity can be made
  248. .]C "MacIlroy" .
  249. .I "fdlibm"
  250. uses extra precise argument reduction, so the computed $sin (n*Pi)$
  251. is small but nonzero. If demands warrant, degree versions of the
  252. trigonometric functions will be added, but for now the style $sin (45*Degree)$ is used.
  253. The library also provides IEEE functions
  254. .I "ilogb, scalbn, copysign, finite, isnan,"
  255. and
  256. .I "nextafter" .
  257. .PP
  258. The functions
  259. .I "dot, norm1, norm2, iamax, gemm"
  260. are adopted from the
  261. .SM BLAS
  262. .]C "blas"
  263. to get tuned linear algebra kernels for
  264. each architecture, possibly using extra-precise accumulators. These are
  265. defined by $sum {{x sub i}{y sub i}}$, $sum roman | {x sub i} roman | $, $ sqrt{sum { x sub {i sup 2}}} $, $i$ such
  266. that $roman | {x sub i} roman | = roman max $, and $C= alpha AB + beta C$ with optional transposes on $A$
  267. and $B$. Since Limbo has only one floating-point type, there is no need here
  268. for a precision prefix. Limbo array slices permit the calling sequences to
  269. be more readable than in Fortran77 or C, though restricted to unit stride.
  270. This encourages better cache performance anyway. The matrix multiply
  271. function
  272. .I "gemm"
  273. remains general stride (and is the foundation for
  274. other operations
  275. .]C "Kagstrom" ).
  276. .PP
  277. Limbo is like C in providing singly-subscripted arrays with indexing
  278. starting at 0. Although Limbo offers arrays of arrays, as in C, for
  279. scientific work a better choice is to adopt the style of linearizing
  280. subscripts using Fortran storage order. This promotes easier exchange of
  281. data with other applications and reuses effort in organizing loops to
  282. achieve good locality. In previous language work
  283. .]C "pine" ,
  284. we implemented
  285. a C preprocessor that allowed the programmer to choose a convenient origin
  286. (such as 1) and have it compiled into 0 for the base language; because we
  287. passed arrays as dope vectors, we were even able to allow different origins
  288. for the same array in calling and called functions. The main lesson we
  289. learned from that experience, however, was that permutations become a
  290. nightmare when there is anything but dogmatic adherence to a single origin.
  291. So for an $m$ by $n$ matrix $A$, the programmer should use loops with $0<=
  292. i<m$ and $0<= j<n$ and access $A[i+m*j]$.
  293. .PP
  294. For interoperability with foreign file formats and for saving main memory in
  295. selected applications, functions are provided for copying bits between and
  296. reals and 32-bit or 64-bit IEEE-format values.
  297. .PP
  298. Finally,
  299. .I "math"
  300. provides a tuned quicksort function
  301. .I "sort(x,p)"
  302. where
  303. .I "x"
  304. is a real array and
  305. .I "p"
  306. is an int array representing
  307. a 0-origin permutation. This function leaves the contents of
  308. .I "x"
  309. untouched and rearranges
  310. .I "p"
  311. so that $x[{p sub i}]<= x[p sub {i+1}]$. This is
  312. usually what one wants to do: sort an array of abstract data types based on
  313. some key, but without the need to actually swap large chunks of memory.
  314. .NH 1
  315. Rationale
  316. .PP
  317. This section discusses why certain numerical features were included or not.
  318. .NH 2
  319. Rounding modes and accumulated exceptions
  320. .PP
  321. Directed rounding is only needed in a very few places in scientific
  322. computing, but in those places it is indispensable. Accumulated floating
  323. point exceptions are even more useful. User trap handling is a harder
  324. problem, and may be worth leaving for later, possibly with a default
  325. ``retrospective diagnostics'' log
  326. .]C "Kahan" .
  327. .PP
  328. Note that the exception masks must be architecture independent, since they
  329. reside in the Limbo bytecodes, and therefore the implementation involves a
  330. small amount of bit fiddling. Still, it is efficient enough to encourage
  331. use. It would be difficult to port to a processor that only had static
  332. rounding modes in instruction opcodes rather than the dynamic model
  333. specified in section 2 of the IEEE standard. Fortunately, the Alpha
  334. does provide both models.
  335. .NH 2
  336. Sudden underflow
  337. .PP
  338. Some processor vendors make supporting gradual underflow just too hard. (One
  339. must struggle upon the system trap to reconstruct exactly which instruction
  340. was executing and what the state of the registers was. On the MIPS, it is
  341. said to be 30 pages of assembler.) So Inferno supports denormalized numbers
  342. only if the hardware makes this easy. Providing underflow that is correct
  343. but very slow, as some systems do, is not necessarily doing the user a favor.
  344. .PP
  345. To determine portably if a particular system offers gradual underflow, mask
  346. off UNFL and do trial arithmetic.
  347. .NH 2
  348. Speed
  349. .PP
  350. Computers with slow (software) gradual underflow usually provide a fast
  351. flush-to-0 alternative. This often suffices, though there are important
  352. examples where it forces an uglier and slower coding style. A worse
  353. situation is if the hardware uses system traps for Infinity and NaN
  354. arithmetic. The resulting slowdown will make otherwise excellent and natural
  355. algorithms run slowly
  356. .]C "Demmel" .
  357. Sadly, even some x86 implementations
  358. that do non-finite arithmetic in hardware, do it relatively slowly.
  359. .PP
  360. We considered providing syntax to declare a certain program scope within
  361. which precise IEEE behavior was required, and relaxing the rules outside
  362. such scopes.
  363. (The numerical C extensions
  364. .]C "NumerC"
  365. use pragma
  366. for this purpose.)
  367. These scope declarations would need to be in the
  368. bytecodes, since significant optimization may be attempted by the runtime
  369. compiler. After some discussion, and with some trepidation, it was agreed
  370. that instead all compilers would be required to preserve the same result and
  371. status as for an unoptimized version.
  372. .NH 2
  373. Comparison
  374. .PP
  375. The standard C operators
  376. .CW ==
  377. .CW !=
  378. .CW "<"
  379. .CW "<="
  380. .CW ">"
  381. .CW ">="
  382. are the only comparisons provided, and they behave exactly
  383. like the ``math'' part of Table 4 of the IEEE standard. Programs interested
  384. in handling NaN data should test explicitly. This seems to be the way most
  385. people program and leads to code more understandable to nonexperts. It is
  386. true that with more operators one can correctly write code that propagates
  387. NaNs to a successful conclusion\-but that support has been left for later.
  388. NaN(''tag'') should be added at that same time.
  389. .NH 2
  390. Precision
  391. .PP
  392. All implementations run exclusively in IEEE double precision. If the
  393. hardware has extra-precise accumulators, the round-to-double mode is set
  394. automatically and not changeable, in keeping with Limbo's design to have
  395. only one floating point type. Extended precision hardware, if available, may
  396. be used by the built-in elementary function and
  397. .SM BLAS
  398. libraries.
  399. Also, we contemplate adding a dotsharp function that would use a very long
  400. accumulator for very precise inner products, independent of the order of
  401. vector elements
  402. .]C "kulisch" .
  403. But reference implementations that use only
  404. double precision, avoid contracted multiply-add, and evaluate in the order 1
  405. up to n will always be available for strict portability.
  406. .PP
  407. At the time the decision was made to restrict the system to 64-bit floating
  408. point, Limbo integers were almost exclusively 32-bit and the consistency
  409. argument to have a single real type was compelling. Now that Limbo has more
  410. integer types the decision might be reconsidered. But so many engineers
  411. needlessly struggle with programs run in short precision, that offering it
  412. may do as much harm as good. On most modern computers used for general
  413. purpose scientific computing, 64-bit floating point arithmetic is as fast as
  414. 32-bit, except for the memory traffic. In cases where the shorter precision
  415. would suffice and memory is a crucial concern, the programmer should
  416. consider carefully scaled fixed point or specialized compression. To
  417. efficiently interoperate with data files that use the short format,
  418. programmers may use the provided realbits32 function. While there are surely
  419. appropriate uses for a first-class 32-bit real type, for now we follow
  420. Kahan's sarcastic motto ``why use lead when gold will do?''
  421. .NH 2
  422. BLAS
  423. .PP
  424. The few
  425. .SM BLAS
  426. in the core library were chosen for readability and,
  427. in case of gemm, for optimization beyond what a reasonable compiler would
  428. attempt. We expect that compilers will (soon) be good enough that the
  429. difference between compiling $y+=a*x$ and calling daxpy is small. Also, as
  430. mentioned above, dot and gemm might reasonably use combined multiply-add or
  431. a long accumulator in some optional implementations.
  432. .NH 2
  433. $GAMMA ( x )$
  434. .PP
  435. To avoid confusion with the C math library, which defined
  436. .I "gamma"
  437. as $ln GAMMA $, we offer only
  438. .I "lgamma"
  439. for now. This function and
  440. .I "modf"
  441. return an (int,real) tuple rather than assigning through an
  442. integer pointer, in keeping with Limbo's design. The opportunity has been
  443. taken to drop some obsolete functions like
  444. .I "frexp" .
  445. Other functions
  446. are unchanged from the C math library.
  447. .NH 2
  448. Future
  449. .PP
  450. A prototype preprocessor has been written to allow the scientific programmer
  451. to write $A[i,j]$ for an $A$ that was created as a $Matrix(m,n)$ and to have
  452. the subscript linearization done automatically. Here $Matrix$ is an Limbo
  453. abstract data type containing a real array and integers $m$, $n$, and column
  454. stride $lda$ used as in typical Fortran calling sequences.
  455. .PP
  456. The Limbo compiler is soon expected to implement the type
  457. .I "complex" .
  458. .PP
  459. Higher level numerical libraries will also be provided, and although that
  460. topic is beyond the scope of this paper, opinions about what should come
  461. first would be welcome.
  462. .PP
  463. Distributed computing has not been mentioned here because it involves
  464. relatively few considerations specific to floating point computation.
  465. However, it may be worth noting that in the default environment (with
  466. underflow trapped, so that presence or absence of denormalized numbers is
  467. not significant) programs run independently on heterogeneous machines
  468. nevertheless get precisely identical results, even with respect to thread
  469. scheduling. This implies that certain communication steps can be avoided,
  470. and that regression testing is considerably simplified.
  471. .PP
  472. Please direct comments on these numerical aspects of Inferno to Eric Grosse.
  473. More general technical comments can be directed to Vita Nuova
  474. .CW comments@vitanuova.com ). (
  475. I am grateful to Joe Darcy, Berkeley,
  476. to David Gay, Bell Labs, to David Hook, University of Melbourne,
  477. and to participants of the IFIP WG2.5 Working
  478. Conference on Quality of Numerical Software for insightful comments on a
  479. first draft of this paper.
  480. .\"the principal developers of Inferno: Sean Dorward, Rob Pike, Dave Presotto, Howard Trickey, and Phil Winterbottom.
  481. .SH
  482. Trademarks
  483. .LP
  484. Inferno, Limbo, and Dis are trademarks of Vita Nuova Holdings Limited.
  485. Unix is a trademark of Unix Systems Laboratories.
  486. Windows95 is a trademark of Microsoft.
  487. .EQ
  488. delim off
  489. .EN
  490. .SH
  491. References
  492. .nr PS -1
  493. .nr VS -1
  494. .LP
  495. .nr [N 0 1
  496. .de ]N
  497. .IP \\n+([N.
  498. .if \nZ>0 .tm \\$1 \\n([N
  499. ..
  500. ....
  501. ....
  502. .]N "Inferno"
  503. S. Dorward,
  504. R. Pike,
  505. D.\ L. Presotto,
  506. D.\ M. Ritchie,
  507. H. Trickey,
  508. P. Winterbottom,
  509. ``The Inferno Operating System'',
  510. .I "Bell Labs Technical Journal" ,
  511. Vol. 2,
  512. No. 1,
  513. Winter 1997, pp. 5-18.
  514. Reprinted in this volume.
  515. .]N "Clinger"
  516. W.\ D. Clinger.
  517. ``How to read floating point numbers accurately.
  518. In \fIProceedings of the ACM SIGPLAN'90 Conference on Programming
  519. Language Design and Implementation\fP, pages 92-101, 1990.
  520. ....
  521. ....
  522. .]N "Demmel"
  523. James\ W. Demmel and Xiaoye Li.
  524. Faster numerical algorithms via exception handling.
  525. In Jr. Earl\ Swartzlander, Mary\ Jane Irwin, and Graham Jullien,
  526. editors, \fIProceedings: 11th Symposium on Computer Arithmetic\fP. IEEE
  527. Computer Society Press, 1993.
  528. ....
  529. ....
  530. .]N "blas"
  531. Jack\ J. Dongarra, Jeremy\ Du Croz, Sven Hammarling, and Richard\ J. Hanson.
  532. Algorithm 656: An extended set of Basic Linear Algebra Subprograms.
  533. \fIACM Trans. on Mathematical Software\fP, 14(1):18-32, March 1988.
  534. ....
  535. ....
  536. .]N "Gay"
  537. D.\ M. Gay.
  538. Correctly rounded binary-decimal and decimal-binary conversions.
  539. Numerical Analysis Manuscript No. 90-10, AT&T Bell Laboratories,
  540. Murray Hill, NJ, 1990.
  541. freely redistributable, available at
  542. .CW http://netlib.bell-labs.com/netlib/fp/ .
  543. ....
  544. ....
  545. .]N "pine"
  546. E.\ H. Grosse and W.\ M. Coughran, Jr.
  547. The pine programming language.
  548. Numerical Analysis Manuscript 83-4, AT&T Bell Laboratories, 1983.
  549. .br
  550. .CW ftp://cm.bell-labs.com/cm/cs/doc/92/pine.ps.Z .
  551. ....
  552. ....
  553. .]N "IEEEfp"
  554. IEEE.
  555. Standard for binary floating-point arithmetic.
  556. Technical Report Std 754-1985, ANSI, 1985.
  557. ....
  558. ....
  559. .]N "Kagstrom"
  560. Bo\ Kagstrom, Per Ling, and Charles Van\ Loan.
  561. Portable high performance GEMM-based Level 3 BLAS.
  562. In R.\ F.\ Sincovec et\ al., editor, \fIParallel Processing for
  563. Scientific Computing\fP, pages 339-346. SIAM Publications, 1993.
  564. .CW /netlib/blas/ .
  565. ....
  566. ....
  567. .]N "Kahan"
  568. W.\ Kahan.
  569. Lecture notes on the status of IEEE Standard 754 for binary
  570. floating-point arithmetic.
  571. Technical report, Univ. Calif. Berkeley, May 23 1995.
  572. Work in Progress.
  573. ....
  574. ....
  575. .]N "kulisch"
  576. U.\ Kulisch and W.L. Miranker.
  577. \fIComputer arithmetic in theory and practice.\fP
  578. Academic Press, 1980.
  579. ....
  580. ....
  581. .]N "MacIlroy"
  582. M.\ D. McIlroy.
  583. Mass produced software components.
  584. In Peter Naur and Brian Randell, editors, \fISoftware Engineering\fP,
  585. pages 138-155, 1969.
  586. Garmisch, Germany, October 1968.
  587. ....
  588. ....
  589. .]N "fdlibm"
  590. Kwok\ C. Ng.
  591. .CW fdlibm :
  592. C math library for machines that support ieee 754
  593. floating-point.
  594. freely redistributable; available at
  595. .CW http://netlib.bell-labs.com/netlib/fdlibm/ ,
  596. March 1995.
  597. ....
  598. ....
  599. .]N "SteeleWhite"
  600. G.\ L. Steele and J.\ L. White.
  601. How to print floating point numbers accurately.
  602. In \fIProceedings of the ACM SIGPLAN'90 Conference on Programming
  603. Language Design and Implementation\fP, pages 112-126, 1990.
  604. ....
  605. ....
  606. .]N "NumerC"
  607. X3J11.1.
  608. Chapter 5, floating-point C extensions.
  609. Technical report, ANSI, March 29 1995.
  610. .nr PS +1
  611. .nr VS +1