awk 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578
  1. .TH AWK 1
  2. .SH NAME
  3. awk \- pattern-directed scanning and processing language
  4. .SH SYNOPSIS
  5. .B awk
  6. [
  7. .B -F
  8. .I fs
  9. ]
  10. [
  11. .B -d
  12. ]
  13. [
  14. .BI -mf
  15. .I n
  16. ]
  17. [
  18. .B -mr
  19. .I n
  20. ]
  21. [
  22. .B -safe
  23. ]
  24. [
  25. .B -v
  26. .I var=value
  27. ]
  28. [
  29. .B -f
  30. .I progfile
  31. |
  32. .I prog
  33. ]
  34. [
  35. .I file ...
  36. ]
  37. .SH DESCRIPTION
  38. .I Awk
  39. scans each input
  40. .I file
  41. for lines that match any of a set of patterns specified literally in
  42. .I prog
  43. or in one or more files
  44. specified as
  45. .B -f
  46. .IR progfile .
  47. With each pattern
  48. there can be an associated action that will be performed
  49. when a line of a
  50. .I file
  51. matches the pattern.
  52. Each line is matched against the
  53. pattern portion of every pattern-action statement;
  54. the associated action is performed for each matched pattern.
  55. The file name
  56. .L -
  57. means the standard input.
  58. Any
  59. .IR file
  60. of the form
  61. .I var=value
  62. is treated as an assignment, not a file name,
  63. and is executed at the time it would have been opened if it were a file name.
  64. The option
  65. .B -v
  66. followed by
  67. .I var=value
  68. is an assignment to be done before the program
  69. is executed;
  70. any number of
  71. .B -v
  72. options may be present.
  73. .B -F
  74. .IR fs
  75. option defines the input field separator to be the regular expression
  76. .IR fs .
  77. .PP
  78. An input line is normally made up of fields separated by white space,
  79. or by regular expression
  80. .BR FS .
  81. The fields are denoted
  82. .BR $1 ,
  83. .BR $2 ,
  84. \&..., while
  85. .B $0
  86. refers to the entire line.
  87. If
  88. .BR FS
  89. is null, the input line is split into one field per character.
  90. .PP
  91. To compensate for inadequate implementation of storage management,
  92. the
  93. .B -mr
  94. option can be used to set the maximum size of the input record,
  95. and the
  96. .B -mf
  97. option to set the maximum number of fields.
  98. .PP
  99. The
  100. .B -safe
  101. option causes
  102. .I awk
  103. to run in
  104. ``safe mode,''
  105. in which it is not allowed to
  106. run shell commands or open files
  107. and the environment is not made available
  108. in the
  109. .B ENVIRON
  110. variable.
  111. .PP
  112. A pattern-action statement has the form
  113. .IP
  114. .IB pattern " { " action " }
  115. .PP
  116. A missing
  117. .BI { " action " }
  118. means print the line;
  119. a missing pattern always matches.
  120. Pattern-action statements are separated by newlines or semicolons.
  121. .PP
  122. An action is a sequence of statements.
  123. A statement can be one of the following:
  124. .PP
  125. .EX
  126. .ta \w'\fLdelete array[expression]'u
  127. if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
  128. while(\fI expression \fP)\fI statement\fP
  129. for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
  130. for(\fI var \fPin\fI array \fP)\fI statement\fP
  131. do\fI statement \fPwhile(\fI expression \fP)
  132. break
  133. continue
  134. {\fR [\fP\fI statement ... \fP\fR] \fP}
  135. \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
  136. print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  137. printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  138. return\fR [ \fP\fIexpression \fP\fR]\fP
  139. next #\fR skip remaining patterns on this input line\fP
  140. nextfile #\fR skip rest of this file, open next, start at top\fP
  141. delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
  142. delete\fI array\fP #\fR delete all elements of array\fP
  143. exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
  144. .EE
  145. .DT
  146. .PP
  147. Statements are terminated by
  148. semicolons, newlines or right braces.
  149. An empty
  150. .I expression-list
  151. stands for
  152. .BR $0 .
  153. String constants are quoted \&\fL"\ "\fR,
  154. with the usual C escapes recognized within.
  155. Expressions take on string or numeric values as appropriate,
  156. and are built using the operators
  157. .B + \- * / % ^
  158. (exponentiation), and concatenation (indicated by white space).
  159. The operators
  160. .B
  161. ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
  162. are also available in expressions.
  163. Variables may be scalars, array elements
  164. (denoted
  165. .IB x [ i ] )
  166. or fields.
  167. Variables are initialized to the null string.
  168. Array subscripts may be any string,
  169. not necessarily numeric;
  170. this allows for a form of associative memory.
  171. Multiple subscripts such as
  172. .B [i,j,k]
  173. are permitted; the constituents are concatenated,
  174. separated by the value of
  175. .BR SUBSEP .
  176. .PP
  177. The
  178. .B print
  179. statement prints its arguments on the standard output
  180. (or on a file if
  181. .BI > file
  182. or
  183. .BI >> file
  184. is present or on a pipe if
  185. .BI | cmd
  186. is present), separated by the current output field separator,
  187. and terminated by the output record separator.
  188. .I file
  189. and
  190. .I cmd
  191. may be literal names or parenthesized expressions;
  192. identical string values in different statements denote
  193. the same open file.
  194. The
  195. .B printf
  196. statement formats its expression list according to the format
  197. (see
  198. .IR fprintf (2)) .
  199. The built-in function
  200. .BI close( expr )
  201. closes the file or pipe
  202. .IR expr .
  203. The built-in function
  204. .BI fflush( expr )
  205. flushes any buffered output for the file or pipe
  206. .IR expr .
  207. If
  208. .IR expr
  209. is omitted or is a null string, all open files are flushed.
  210. .PP
  211. The mathematical functions
  212. .BR exp ,
  213. .BR log ,
  214. .BR sqrt ,
  215. .BR sin ,
  216. .BR cos ,
  217. and
  218. .BR atan2
  219. are built in.
  220. Other built-in functions:
  221. .TF length
  222. .TP
  223. .B length
  224. If its argument is a string, the string's length is returned.
  225. If its argument is an array, the number of subscripts in the array is returned.
  226. If no argument, the length of
  227. .B $0
  228. is returned.
  229. .TP
  230. .B rand
  231. random number on (0,1)
  232. .TP
  233. .B srand
  234. sets seed for
  235. .B rand
  236. and returns the previous seed.
  237. .TP
  238. .B int
  239. truncates to an integer value
  240. .TP
  241. .B utf
  242. converts its numerical argument, a character number, to a
  243. .SM UTF
  244. string
  245. .TP
  246. .BI substr( s , " m" , " n\fL)
  247. the
  248. .IR n -character
  249. substring of
  250. .I s
  251. that begins at position
  252. .IR m
  253. counted from 1.
  254. If
  255. .I n
  256. is omitted, it is taken to be the length of
  257. .I s
  258. from
  259. .IR m .
  260. .TP
  261. .BI index( s , " t" )
  262. the position in
  263. .I s
  264. where the string
  265. .I t
  266. occurs, or 0 if it does not.
  267. .TP
  268. .BI match( s , " r" )
  269. the position in
  270. .I s
  271. where the regular expression
  272. .I r
  273. occurs, or 0 if it does not.
  274. The variables
  275. .B RSTART
  276. and
  277. .B RLENGTH
  278. are set to the position and length of the matched string.
  279. .TP
  280. .BI split( s , " a" , " fs\fL)
  281. splits the string
  282. .I s
  283. into array elements
  284. .IB a [1]\f1,
  285. .IB a [2]\f1,
  286. \&...,
  287. .IB a [ n ]\f1,
  288. and returns
  289. .IR n .
  290. The separation is done with the regular expression
  291. .I fs
  292. or with the field separator
  293. .B FS
  294. if
  295. .I fs
  296. is not given.
  297. An empty string as field separator splits the string
  298. into one array element per character.
  299. .TP
  300. .BI sub( r , " t" , " s\fL)
  301. substitutes
  302. .I t
  303. for the first occurrence of the regular expression
  304. .I r
  305. in the string
  306. .IR s .
  307. If
  308. .I s
  309. is not given,
  310. .B $0
  311. is used.
  312. A
  313. .L &
  314. character in
  315. .I t
  316. will be replaced by the sub-string of
  317. .I s
  318. matched by
  319. .IR r ;
  320. it may be escaped with
  321. .L \e
  322. to substitute a literal
  323. .LR & .
  324. .TP
  325. .B gsub
  326. same as
  327. .B sub
  328. except that all occurrences of the regular expression
  329. are replaced;
  330. .B sub
  331. and
  332. .B gsub
  333. return the number of replacements.
  334. .TP
  335. .BI sprintf( fmt , " expr" , " ...\fL)
  336. the string resulting from formatting
  337. .I expr ...
  338. according to the
  339. .I printf
  340. format
  341. .I fmt
  342. .TP
  343. .BI system( cmd )
  344. executes
  345. .I cmd
  346. and returns its exit status
  347. .TP
  348. .BI tolower( str )
  349. returns a copy of
  350. .I str
  351. with all upper-case characters translated to their
  352. corresponding lower-case equivalents.
  353. .TP
  354. .BI toupper( str )
  355. returns a copy of
  356. .I str
  357. with all lower-case characters translated to their
  358. corresponding upper-case equivalents.
  359. .PD
  360. .PP
  361. The ``function''
  362. .B getline
  363. sets
  364. .B $0
  365. to the next input record from the current input file;
  366. .B getline
  367. .BI < file
  368. sets
  369. .B $0
  370. to the next record from
  371. .IR file .
  372. .B getline
  373. .I x
  374. sets variable
  375. .I x
  376. instead.
  377. Finally,
  378. .IB cmd " | getline
  379. pipes the output of
  380. .I cmd
  381. into
  382. .BR getline ;
  383. each call of
  384. .B getline
  385. returns the next line of output from
  386. .IR cmd .
  387. In all cases,
  388. .B getline
  389. returns 1 for a successful input,
  390. 0 for end of file, and \-1 for an error.
  391. .PP
  392. Patterns are arbitrary Boolean combinations
  393. (with
  394. .BR "! || &&" )
  395. of regular expressions and
  396. relational expressions.
  397. Regular expressions are as in
  398. .IR regexp (6).
  399. Isolated regular expressions
  400. in a pattern apply to the entire line.
  401. Regular expressions may also occur in
  402. relational expressions, using the operators
  403. .BR ~
  404. and
  405. .BR !~ .
  406. .BI / re /
  407. is a constant regular expression;
  408. any string (constant or variable) may be used
  409. as a regular expression, except in the position of an isolated regular expression
  410. in a pattern.
  411. .PP
  412. A pattern may consist of two patterns separated by a comma;
  413. in this case, the action is performed for all lines
  414. from an occurrence of the first pattern
  415. though an occurrence of the second.
  416. .PP
  417. A relational expression is one of the following:
  418. .IP
  419. .I expression matchop regular-expression
  420. .br
  421. .I expression relop expression
  422. .br
  423. .IB expression " in " array-name
  424. .br
  425. .BI ( expr , expr,... ") in " array-name
  426. .PP
  427. where a
  428. .I relop
  429. is any of the six relational operators in C,
  430. and a
  431. .I matchop
  432. is either
  433. .B ~
  434. (matches)
  435. or
  436. .B !~
  437. (does not match).
  438. A conditional is an arithmetic expression,
  439. a relational expression,
  440. or a Boolean combination
  441. of these.
  442. .PP
  443. The special patterns
  444. .B BEGIN
  445. and
  446. .B END
  447. may be used to capture control before the first input line is read
  448. and after the last.
  449. .B BEGIN
  450. and
  451. .B END
  452. do not combine with other patterns.
  453. .PP
  454. Variable names with special meanings:
  455. .TF FILENAME
  456. .TP
  457. .B CONVFMT
  458. conversion format used when converting numbers
  459. (default
  460. .BR "%.6g" )
  461. .TP
  462. .B FS
  463. regular expression used to separate fields; also settable
  464. by option
  465. .BI \-F fs\f1.
  466. .TP
  467. .BR NF
  468. number of fields in the current record
  469. .TP
  470. .B NR
  471. ordinal number of the current record
  472. .TP
  473. .B FNR
  474. ordinal number of the current record in the current file
  475. .TP
  476. .B FILENAME
  477. the name of the current input file
  478. .TP
  479. .B RS
  480. input record separator (default newline)
  481. .TP
  482. .B OFS
  483. output field separator (default blank)
  484. .TP
  485. .B ORS
  486. output record separator (default newline)
  487. .TP
  488. .B OFMT
  489. output format for numbers (default
  490. .BR "%.6g" )
  491. .TP
  492. .B SUBSEP
  493. separates multiple subscripts (default 034)
  494. .TP
  495. .B ARGC
  496. argument count, assignable
  497. .TP
  498. .B ARGV
  499. argument array, assignable;
  500. non-null members are taken as file names
  501. .TP
  502. .B ENVIRON
  503. array of environment variables; subscripts are names.
  504. .PD
  505. .PP
  506. Functions may be defined (at the position of a pattern-action statement) thus:
  507. .IP
  508. .L
  509. function foo(a, b, c) { ...; return x }
  510. .PP
  511. Parameters are passed by value if scalar and by reference if array name;
  512. functions may be called recursively.
  513. Parameters are local to the function; all other variables are global.
  514. Thus local variables may be created by providing excess parameters in
  515. the function definition.
  516. .SH EXAMPLES
  517. .TP
  518. .L
  519. length($0) > 72
  520. Print lines longer than 72 characters.
  521. .TP
  522. .L
  523. { print $2, $1 }
  524. Print first two fields in opposite order.
  525. .PP
  526. .EX
  527. BEGIN { FS = ",[ \et]*|[ \et]+" }
  528. { print $2, $1 }
  529. .EE
  530. .ns
  531. .IP
  532. Same, with input fields separated by comma and/or blanks and tabs.
  533. .PP
  534. .EX
  535. { s += $1 }
  536. END { print "sum is", s, " average is", s/NR }
  537. .EE
  538. .ns
  539. .IP
  540. Add up first column, print sum and average.
  541. .TP
  542. .L
  543. /start/, /stop/
  544. Print all lines between start/stop pairs.
  545. .PP
  546. .EX
  547. BEGIN { # Simulate echo(1)
  548. for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
  549. printf "\en"
  550. exit }
  551. .EE
  552. .SH SOURCE
  553. .B /sys/src/cmd/awk
  554. .SH SEE ALSO
  555. .IR sed (1),
  556. .IR regexp (6),
  557. .br
  558. A. V. Aho, B. W. Kernighan, P. J. Weinberger,
  559. .I
  560. The AWK Programming Language,
  561. Addison-Wesley, 1988. ISBN 0-201-07981-X
  562. .SH BUGS
  563. There are no explicit conversions between numbers and strings.
  564. To force an expression to be treated as a number add 0 to it;
  565. to force it to be treated as a string concatenate
  566. \&\fL""\fP to it.
  567. .br
  568. The scope rules for variables in functions are a botch;
  569. the syntax is worse.
  570. .br
  571. UTF is not always dealt with correctly,
  572. though
  573. .I awk
  574. does make an attempt to do so.
  575. The
  576. .I split
  577. function with an empty string as final argument now copes
  578. with UTF in the string being split.