awk 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549
  1. .TH AWK 1
  2. .SH NAME
  3. awk \- pattern-directed scanning and processing language
  4. .SH SYNOPSIS
  5. .B awk
  6. [
  7. .B -F
  8. .I fs
  9. ]
  10. [
  11. .B -d
  12. ]
  13. [
  14. .B -safe
  15. ]
  16. [
  17. .B -v
  18. .I var=value
  19. ]
  20. [
  21. .B -f
  22. .I progfile
  23. |
  24. .I prog
  25. ]
  26. [
  27. .I file ...
  28. ]
  29. .SH DESCRIPTION
  30. .I Awk
  31. scans each input
  32. .I file
  33. for lines that match any of a set of patterns specified literally in
  34. .I prog
  35. or in one or more files
  36. specified as
  37. .B -f
  38. .IR progfile .
  39. With each pattern
  40. there can be an associated action that will be performed
  41. when a line of a
  42. .I file
  43. matches the pattern.
  44. Each line is matched against the
  45. pattern portion of every pattern-action statement;
  46. the associated action is performed for each matched pattern.
  47. The file name
  48. .L -
  49. means the standard input.
  50. Any
  51. .IR file
  52. of the form
  53. .I var=value
  54. is treated as an assignment, not a file name,
  55. and is executed at the time it would have been opened if it were a file name.
  56. The option
  57. .B -v
  58. followed by
  59. .I var=value
  60. is an assignment to be done before the program
  61. is executed;
  62. any number of
  63. .B -v
  64. options may be present.
  65. .B -F
  66. .IR fs
  67. option defines the input field separator to be the regular expression
  68. .IR fs .
  69. .PP
  70. An input line is normally made up of fields separated by white space,
  71. or by regular expression
  72. .BR FS .
  73. The fields are denoted
  74. .BR $1 ,
  75. .BR $2 ,
  76. \&..., while
  77. .B $0
  78. refers to the entire line.
  79. If
  80. .BR FS
  81. is null, the input line is split into one field per character.
  82. .PP
  83. A pattern-action statement has the form
  84. .IP
  85. .IB pattern " { " action " }
  86. .PP
  87. A missing
  88. .BI { " action " }
  89. means print the line;
  90. a missing pattern always matches.
  91. Pattern-action statements are separated by newlines or semicolons.
  92. .PP
  93. An action is a sequence of statements.
  94. A statement can be one of the following:
  95. .PP
  96. .EX
  97. .ta \w'\fLdelete array[expression]'u
  98. if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
  99. while(\fI expression \fP)\fI statement\fP
  100. for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
  101. for(\fI var \fPin\fI array \fP)\fI statement\fP
  102. do\fI statement \fPwhile(\fI expression \fP)
  103. break
  104. continue
  105. {\fR [\fP\fI statement ... \fP\fR] \fP}
  106. \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
  107. print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  108. printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  109. return\fR [ \fP\fIexpression \fP\fR]\fP
  110. next #\fR skip remaining patterns on this input line\fP
  111. nextfile #\fR skip rest of this file, open next, start at top\fP
  112. delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
  113. delete\fI array\fP #\fR delete all elements of array\fP
  114. exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
  115. .EE
  116. .DT
  117. .PP
  118. Statements are terminated by
  119. semicolons, newlines or right braces.
  120. An empty
  121. .I expression-list
  122. stands for
  123. .BR $0 .
  124. String constants are quoted \&\fL"\ "\fR,
  125. with the usual C escapes recognized within.
  126. Expressions take on string or numeric values as appropriate,
  127. and are built using the operators
  128. .B + \- * / % ^
  129. (exponentiation), and concatenation (indicated by white space).
  130. The operators
  131. .B
  132. ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
  133. are also available in expressions.
  134. Variables may be scalars, array elements
  135. (denoted
  136. .IB x [ i ] )
  137. or fields.
  138. Variables are initialized to the null string.
  139. Array subscripts may be any string,
  140. not necessarily numeric;
  141. this allows for a form of associative memory.
  142. Multiple subscripts such as
  143. .B [i,j,k]
  144. are permitted; the constituents are concatenated,
  145. separated by the value of
  146. .BR SUBSEP .
  147. .PP
  148. The
  149. .B print
  150. statement prints its arguments on the standard output
  151. (or on a file if
  152. .BI > file
  153. or
  154. .BI >> file
  155. is present or on a pipe if
  156. .BI | cmd
  157. is present), separated by the current output field separator,
  158. and terminated by the output record separator.
  159. .I file
  160. and
  161. .I cmd
  162. may be literal names or parenthesized expressions;
  163. identical string values in different statements denote
  164. the same open file.
  165. The
  166. .B printf
  167. statement formats its expression list according to the format
  168. (see
  169. .IR fprintf (2)) .
  170. The built-in function
  171. .BI close( expr )
  172. closes the file or pipe
  173. .IR expr .
  174. The built-in function
  175. .BI fflush( expr )
  176. flushes any buffered output for the file or pipe
  177. .IR expr .
  178. If
  179. .IR expr
  180. is omitted or is a null string, all open files are flushed.
  181. .PP
  182. The mathematical functions
  183. .BR exp ,
  184. .BR log ,
  185. .BR sqrt ,
  186. .BR sin ,
  187. .BR cos ,
  188. and
  189. .BR atan2
  190. are built in.
  191. Other built-in functions:
  192. .TF length
  193. .TP
  194. .B length
  195. If its argument is a string, the string's length is returned.
  196. If its argument is an array, the number of subscripts in the array is returned.
  197. If no argument, the length of
  198. .B $0
  199. is returned.
  200. .TP
  201. .B rand
  202. random number on (0,1)
  203. .TP
  204. .B srand
  205. sets seed for
  206. .B rand
  207. and returns the previous seed.
  208. .TP
  209. .B int
  210. truncates to an integer value
  211. .TP
  212. .B utf
  213. converts its numerical argument, a character number, to a
  214. .SM UTF
  215. string
  216. .TP
  217. .BI substr( s , " m" , " n\fL)
  218. the
  219. .IR n -character
  220. substring of
  221. .I s
  222. that begins at position
  223. .IR m
  224. counted from 1.
  225. If
  226. .I n
  227. is omitted, it is taken to be the length of
  228. .I s
  229. from
  230. .IR m .
  231. .TP
  232. .BI index( s , " t" )
  233. the position in
  234. .I s
  235. where the string
  236. .I t
  237. occurs, or 0 if it does not.
  238. .TP
  239. .BI match( s , " r" )
  240. the position in
  241. .I s
  242. where the regular expression
  243. .I r
  244. occurs, or 0 if it does not.
  245. The variables
  246. .B RSTART
  247. and
  248. .B RLENGTH
  249. are set to the position and length of the matched string.
  250. .TP
  251. .BI split( s , " a" , " fs\fL)
  252. splits the string
  253. .I s
  254. into array elements
  255. .IB a [1]\f1,
  256. .IB a [2]\f1,
  257. \&...,
  258. .IB a [ n ]\f1,
  259. and returns
  260. .IR n .
  261. The separation is done with the regular expression
  262. .I fs
  263. or with the field separator
  264. .B FS
  265. if
  266. .I fs
  267. is not given.
  268. An empty string as field separator splits the string
  269. into one array element per character.
  270. .TP
  271. .BI sub( r , " t" , " s\fL)
  272. substitutes
  273. .I t
  274. for the first occurrence of the regular expression
  275. .I r
  276. in the string
  277. .IR s .
  278. If
  279. .I s
  280. is not given,
  281. .B $0
  282. is used.
  283. A
  284. .L &
  285. character in
  286. .I t
  287. will be replaced by the sub-string of
  288. .I s
  289. matched by
  290. .IR r ;
  291. it may be escaped with
  292. .L \e
  293. to substitute a literal
  294. .LR & .
  295. .TP
  296. .B gsub
  297. same as
  298. .B sub
  299. except that all occurrences of the regular expression
  300. are replaced;
  301. .B sub
  302. and
  303. .B gsub
  304. return the number of replacements.
  305. .TP
  306. .BI sprintf( fmt , " expr" , " ...\fL)
  307. the string resulting from formatting
  308. .I expr ...
  309. according to the
  310. .I printf
  311. format
  312. .I fmt
  313. .TP
  314. .BI system( cmd )
  315. executes
  316. .I cmd
  317. and returns its exit status
  318. .TP
  319. .BI tolower( str )
  320. returns a copy of
  321. .I str
  322. with all upper-case characters translated to their
  323. corresponding lower-case equivalents.
  324. .TP
  325. .BI toupper( str )
  326. returns a copy of
  327. .I str
  328. with all lower-case characters translated to their
  329. corresponding upper-case equivalents.
  330. .PD
  331. .PP
  332. The ``function''
  333. .B getline
  334. sets
  335. .B $0
  336. to the next input record from the current input file;
  337. .B getline
  338. .BI < file
  339. sets
  340. .B $0
  341. to the next record from
  342. .IR file .
  343. .B getline
  344. .I x
  345. sets variable
  346. .I x
  347. instead.
  348. Finally,
  349. .IB cmd " | getline
  350. pipes the output of
  351. .I cmd
  352. into
  353. .BR getline ;
  354. each call of
  355. .B getline
  356. returns the next line of output from
  357. .IR cmd .
  358. In all cases,
  359. .B getline
  360. returns 1 for a successful input,
  361. 0 for end of file, and \-1 for an error.
  362. .PP
  363. Patterns are arbitrary Boolean combinations
  364. (with
  365. .BR "! || &&" )
  366. of regular expressions and
  367. relational expressions.
  368. Regular expressions are as in
  369. .IR regexp (6).
  370. Isolated regular expressions
  371. in a pattern apply to the entire line.
  372. Regular expressions may also occur in
  373. relational expressions, using the operators
  374. .BR ~
  375. and
  376. .BR !~ .
  377. .BI / re /
  378. is a constant regular expression;
  379. any string (constant or variable) may be used
  380. as a regular expression, except in the position of an isolated regular expression
  381. in a pattern.
  382. .PP
  383. A pattern may consist of two patterns separated by a comma;
  384. in this case, the action is performed for all lines
  385. from an occurrence of the first pattern
  386. though an occurrence of the second.
  387. .PP
  388. A relational expression is one of the following:
  389. .IP
  390. .I expression matchop regular-expression
  391. .br
  392. .I expression relop expression
  393. .br
  394. .IB expression " in " array-name
  395. .br
  396. .BI ( expr , expr,... ") in " array-name
  397. .PP
  398. where a
  399. .I relop
  400. is any of the six relational operators in C,
  401. and a
  402. .I matchop
  403. is either
  404. .B ~
  405. (matches)
  406. or
  407. .B !~
  408. (does not match).
  409. A conditional is an arithmetic expression,
  410. a relational expression,
  411. or a Boolean combination
  412. of these.
  413. .PP
  414. The special patterns
  415. .B BEGIN
  416. and
  417. .B END
  418. may be used to capture control before the first input line is read
  419. and after the last.
  420. .B BEGIN
  421. and
  422. .B END
  423. do not combine with other patterns.
  424. .PP
  425. Variable names with special meanings:
  426. .TF FILENAME
  427. .TP
  428. .B CONVFMT
  429. conversion format used when converting numbers
  430. (default
  431. .BR "%.6g" )
  432. .TP
  433. .B FS
  434. regular expression used to separate fields; also settable
  435. by option
  436. .BI \-F fs\f1.
  437. .TP
  438. .BR NF
  439. number of fields in the current record
  440. .TP
  441. .B NR
  442. ordinal number of the current record
  443. .TP
  444. .B FNR
  445. ordinal number of the current record in the current file
  446. .TP
  447. .B FILENAME
  448. the name of the current input file
  449. .TP
  450. .B RS
  451. input record separator (default newline)
  452. .TP
  453. .B OFS
  454. output field separator (default blank)
  455. .TP
  456. .B ORS
  457. output record separator (default newline)
  458. .TP
  459. .B OFMT
  460. output format for numbers (default
  461. .BR "%.6g" )
  462. .TP
  463. .B SUBSEP
  464. separates multiple subscripts (default 034)
  465. .TP
  466. .B ARGC
  467. argument count, assignable
  468. .TP
  469. .B ARGV
  470. argument array, assignable;
  471. non-null members are taken as file names
  472. .TP
  473. .B ENVIRON
  474. array of environment variables; subscripts are names.
  475. .PD
  476. .PP
  477. Functions may be defined (at the position of a pattern-action statement) thus:
  478. .IP
  479. .L
  480. function foo(a, b, c) { ...; return x }
  481. .PP
  482. Parameters are passed by value if scalar and by reference if array name;
  483. functions may be called recursively.
  484. Parameters are local to the function; all other variables are global.
  485. Thus local variables may be created by providing excess parameters in
  486. the function definition.
  487. .SH EXAMPLES
  488. .TP
  489. .L
  490. length($0) > 72
  491. Print lines longer than 72 characters.
  492. .TP
  493. .L
  494. { print $2, $1 }
  495. Print first two fields in opposite order.
  496. .PP
  497. .EX
  498. BEGIN { FS = ",[ \et]*|[ \et]+" }
  499. { print $2, $1 }
  500. .EE
  501. .ns
  502. .IP
  503. Same, with input fields separated by comma and/or blanks and tabs.
  504. .PP
  505. .EX
  506. { s += $1 }
  507. END { print "sum is", s, " average is", s/NR }
  508. .EE
  509. .ns
  510. .IP
  511. Add up first column, print sum and average.
  512. .TP
  513. .L
  514. /start/, /stop/
  515. Print all lines between start/stop pairs.
  516. .PP
  517. .EX
  518. BEGIN { # Simulate echo(1)
  519. for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
  520. printf "\en"
  521. exit }
  522. .EE
  523. .SH SOURCE
  524. .B /sys/src/cmd/awk
  525. .SH SEE ALSO
  526. .IR sed (1),
  527. .IR regexp (6),
  528. .br
  529. A. V. Aho, B. W. Kernighan, P. J. Weinberger,
  530. .I
  531. The AWK Programming Language,
  532. Addison-Wesley, 1988. ISBN 0-201-07981-X
  533. .SH BUGS
  534. There are no explicit conversions between numbers and strings.
  535. To force an expression to be treated as a number add 0 to it;
  536. to force it to be treated as a string concatenate
  537. \&\fL""\fP to it.
  538. .br
  539. The scope rules for variables in functions are a botch;
  540. the syntax is worse.
  541. .br
  542. UTF is not always dealt with correctly,
  543. though
  544. .I awk
  545. does make an attempt to do so.
  546. The
  547. .I split
  548. function with an empty string as final argument now copes
  549. with UTF in the string being split.