awk 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560
  1. .TH AWK 1
  2. .SH NAME
  3. awk \- pattern-directed scanning and processing language
  4. .SH SYNOPSIS
  5. .B awk
  6. [
  7. .B -F
  8. .I fs
  9. ]
  10. [
  11. .B -d
  12. ]
  13. [
  14. .BI -mf
  15. .I n
  16. ]
  17. [
  18. .B -mr
  19. .I n
  20. ]
  21. [
  22. .B -safe
  23. ]
  24. [
  25. .B -v
  26. .I var=value
  27. ]
  28. [
  29. .B -f
  30. .I progfile
  31. |
  32. .I prog
  33. ]
  34. [
  35. .I file ...
  36. ]
  37. .SH DESCRIPTION
  38. .I Awk
  39. scans each input
  40. .I file
  41. for lines that match any of a set of patterns specified literally in
  42. .I prog
  43. or in one or more files
  44. specified as
  45. .B -f
  46. .IR progfile .
  47. With each pattern
  48. there can be an associated action that will be performed
  49. when a line of a
  50. .I file
  51. matches the pattern.
  52. Each line is matched against the
  53. pattern portion of every pattern-action statement;
  54. the associated action is performed for each matched pattern.
  55. The file name
  56. .L -
  57. means the standard input.
  58. Any
  59. .IR file
  60. of the form
  61. .I var=value
  62. is treated as an assignment, not a file name,
  63. and is executed at the time it would have been opened if it were a file name.
  64. The option
  65. .B -v
  66. followed by
  67. .I var=value
  68. is an assignment to be done before the program
  69. is executed;
  70. any number of
  71. .B -v
  72. options may be present.
  73. .B -F
  74. .IR fs
  75. option defines the input field separator to be the regular expression
  76. .IR fs .
  77. .PP
  78. An input line is normally made up of fields separated by white space,
  79. or by regular expression
  80. .BR FS .
  81. The fields are denoted
  82. .BR $1 ,
  83. .BR $2 ,
  84. \&..., while
  85. .B $0
  86. refers to the entire line.
  87. If
  88. .BR FS
  89. is null, the input line is split into one field per character.
  90. .PP
  91. To compensate for inadequate implementation of storage management,
  92. the
  93. .B -mr
  94. option can be used to set the maximum size of the input record,
  95. and the
  96. .B -mf
  97. option to set the maximum number of fields.
  98. .PP
  99. The
  100. .B -safe
  101. option causes
  102. .I awk
  103. to run in
  104. ``safe mode,''
  105. in which it is not allowed to
  106. run shell commands or open files
  107. and the environment is not made available
  108. in the
  109. .B ENVIRON
  110. variable.
  111. .PP
  112. A pattern-action statement has the form
  113. .IP
  114. .IB pattern " { " action " }
  115. .PP
  116. A missing
  117. .BI { " action " }
  118. means print the line;
  119. a missing pattern always matches.
  120. Pattern-action statements are separated by newlines or semicolons.
  121. .PP
  122. An action is a sequence of statements.
  123. A statement can be one of the following:
  124. .PP
  125. .EX
  126. .ta \w'\fLdelete array[expression]'u
  127. if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
  128. while(\fI expression \fP)\fI statement\fP
  129. for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
  130. for(\fI var \fPin\fI array \fP)\fI statement\fP
  131. do\fI statement \fPwhile(\fI expression \fP)
  132. break
  133. continue
  134. {\fR [\fP\fI statement ... \fP\fR] \fP}
  135. \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
  136. print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  137. printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  138. return\fR [ \fP\fIexpression \fP\fR]\fP
  139. next #\fR skip remaining patterns on this input line\fP
  140. nextfile #\fR skip rest of this file, open next, start at top\fP
  141. delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
  142. delete\fI array\fP #\fR delete all elements of array\fP
  143. exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
  144. .EE
  145. .DT
  146. .PP
  147. Statements are terminated by
  148. semicolons, newlines or right braces.
  149. An empty
  150. .I expression-list
  151. stands for
  152. .BR $0 .
  153. String constants are quoted \&\fL"\ "\fR,
  154. with the usual C escapes recognized within.
  155. Expressions take on string or numeric values as appropriate,
  156. and are built using the operators
  157. .B + \- * / % ^
  158. (exponentiation), and concatenation (indicated by white space).
  159. The operators
  160. .B
  161. ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
  162. are also available in expressions.
  163. Variables may be scalars, array elements
  164. (denoted
  165. .IB x [ i ] )
  166. or fields.
  167. Variables are initialized to the null string.
  168. Array subscripts may be any string,
  169. not necessarily numeric;
  170. this allows for a form of associative memory.
  171. Multiple subscripts such as
  172. .B [i,j,k]
  173. are permitted; the constituents are concatenated,
  174. separated by the value of
  175. .BR SUBSEP .
  176. .PP
  177. The
  178. .B print
  179. statement prints its arguments on the standard output
  180. (or on a file if
  181. .BI > file
  182. or
  183. .BI >> file
  184. is present or on a pipe if
  185. .BI | cmd
  186. is present), separated by the current output field separator,
  187. and terminated by the output record separator.
  188. .I file
  189. and
  190. .I cmd
  191. may be literal names or parenthesized expressions;
  192. identical string values in different statements denote
  193. the same open file.
  194. The
  195. .B printf
  196. statement formats its expression list according to the format
  197. (see
  198. .IR fprintf (2)) .
  199. The built-in function
  200. .BI close( expr )
  201. closes the file or pipe
  202. .IR expr .
  203. The built-in function
  204. .BI fflush( expr )
  205. flushes any buffered output for the file or pipe
  206. .IR expr .
  207. If
  208. .IR expr
  209. is omitted or is a null string, all open files are flushed.
  210. .PP
  211. The mathematical functions
  212. .BR exp ,
  213. .BR log ,
  214. .BR sqrt ,
  215. .BR sin ,
  216. .BR cos ,
  217. and
  218. .BR atan2
  219. are built in.
  220. Other built-in functions:
  221. .TF length
  222. .TP
  223. .B length
  224. If its argument is a string, the string's length is returned.
  225. If its argument is an array, the number of subscripts in the array is returned.
  226. If no argument, the length of
  227. .B $0
  228. is returned.
  229. .TP
  230. .B rand
  231. random number on (0,1)
  232. .TP
  233. .B srand
  234. sets seed for
  235. .B rand
  236. and returns the previous seed.
  237. .TP
  238. .B int
  239. truncates to an integer value
  240. .TP
  241. .B utf
  242. converts its numerical argument, a character number, to a
  243. .SM UTF
  244. string
  245. .TP
  246. .BI substr( s , " m" , " n\fL)
  247. the
  248. .IR n -character
  249. substring of
  250. .I s
  251. that begins at position
  252. .IR m
  253. counted from 1.
  254. .TP
  255. .BI index( s , " t" )
  256. the position in
  257. .I s
  258. where the string
  259. .I t
  260. occurs, or 0 if it does not.
  261. .TP
  262. .BI match( s , " r" )
  263. the position in
  264. .I s
  265. where the regular expression
  266. .I r
  267. occurs, or 0 if it does not.
  268. The variables
  269. .B RSTART
  270. and
  271. .B RLENGTH
  272. are set to the position and length of the matched string.
  273. .TP
  274. .BI split( s , " a" , " fs\fL)
  275. splits the string
  276. .I s
  277. into array elements
  278. .IB a [1]\f1,
  279. .IB a [2]\f1,
  280. \&...,
  281. .IB a [ n ]\f1,
  282. and returns
  283. .IR n .
  284. The separation is done with the regular expression
  285. .I fs
  286. or with the field separator
  287. .B FS
  288. if
  289. .I fs
  290. is not given.
  291. An empty string as field separator splits the string
  292. into one array element per character.
  293. .TP
  294. .BI sub( r , " t" , " s\fL)
  295. substitutes
  296. .I t
  297. for the first occurrence of the regular expression
  298. .I r
  299. in the string
  300. .IR s .
  301. If
  302. .I s
  303. is not given,
  304. .B $0
  305. is used.
  306. .TP
  307. .B gsub
  308. same as
  309. .B sub
  310. except that all occurrences of the regular expression
  311. are replaced;
  312. .B sub
  313. and
  314. .B gsub
  315. return the number of replacements.
  316. .TP
  317. .BI sprintf( fmt , " expr" , " ...\fL)
  318. the string resulting from formatting
  319. .I expr ...
  320. according to the
  321. .I printf
  322. format
  323. .I fmt
  324. .TP
  325. .BI system( cmd )
  326. executes
  327. .I cmd
  328. and returns its exit status
  329. .TP
  330. .BI tolower( str )
  331. returns a copy of
  332. .I str
  333. with all upper-case characters translated to their
  334. corresponding lower-case equivalents.
  335. .TP
  336. .BI toupper( str )
  337. returns a copy of
  338. .I str
  339. with all lower-case characters translated to their
  340. corresponding upper-case equivalents.
  341. .PD
  342. .PP
  343. The ``function''
  344. .B getline
  345. sets
  346. .B $0
  347. to the next input record from the current input file;
  348. .B getline
  349. .BI < file
  350. sets
  351. .B $0
  352. to the next record from
  353. .IR file .
  354. .B getline
  355. .I x
  356. sets variable
  357. .I x
  358. instead.
  359. Finally,
  360. .IB cmd " | getline
  361. pipes the output of
  362. .I cmd
  363. into
  364. .BR getline ;
  365. each call of
  366. .B getline
  367. returns the next line of output from
  368. .IR cmd .
  369. In all cases,
  370. .B getline
  371. returns 1 for a successful input,
  372. 0 for end of file, and \-1 for an error.
  373. .PP
  374. Patterns are arbitrary Boolean combinations
  375. (with
  376. .BR "! || &&" )
  377. of regular expressions and
  378. relational expressions.
  379. Regular expressions are as in
  380. .IR regexp (6).
  381. Isolated regular expressions
  382. in a pattern apply to the entire line.
  383. Regular expressions may also occur in
  384. relational expressions, using the operators
  385. .BR ~
  386. and
  387. .BR !~ .
  388. .BI / re /
  389. is a constant regular expression;
  390. any string (constant or variable) may be used
  391. as a regular expression, except in the position of an isolated regular expression
  392. in a pattern.
  393. .PP
  394. A pattern may consist of two patterns separated by a comma;
  395. in this case, the action is performed for all lines
  396. from an occurrence of the first pattern
  397. though an occurrence of the second.
  398. .PP
  399. A relational expression is one of the following:
  400. .IP
  401. .I expression matchop regular-expression
  402. .br
  403. .I expression relop expression
  404. .br
  405. .IB expression " in " array-name
  406. .br
  407. .BI ( expr , expr,... ") in " array-name
  408. .PP
  409. where a
  410. .I relop
  411. is any of the six relational operators in C,
  412. and a
  413. .I matchop
  414. is either
  415. .B ~
  416. (matches)
  417. or
  418. .B !~
  419. (does not match).
  420. A conditional is an arithmetic expression,
  421. a relational expression,
  422. or a Boolean combination
  423. of these.
  424. .PP
  425. The special patterns
  426. .B BEGIN
  427. and
  428. .B END
  429. may be used to capture control before the first input line is read
  430. and after the last.
  431. .B BEGIN
  432. and
  433. .B END
  434. do not combine with other patterns.
  435. .PP
  436. Variable names with special meanings:
  437. .TF FILENAME
  438. .TP
  439. .B CONVFMT
  440. conversion format used when converting numbers
  441. (default
  442. .BR "%.6g" )
  443. .TP
  444. .B FS
  445. regular expression used to separate fields; also settable
  446. by option
  447. .BI \-F fs\f1.
  448. .TP
  449. .BR NF
  450. number of fields in the current record
  451. .TP
  452. .B NR
  453. ordinal number of the current record
  454. .TP
  455. .B FNR
  456. ordinal number of the current record in the current file
  457. .TP
  458. .B FILENAME
  459. the name of the current input file
  460. .TP
  461. .B RS
  462. input record separator (default newline)
  463. .TP
  464. .B OFS
  465. output field separator (default blank)
  466. .TP
  467. .B ORS
  468. output record separator (default newline)
  469. .TP
  470. .B OFMT
  471. output format for numbers (default
  472. .BR "%.6g" )
  473. .TP
  474. .B SUBSEP
  475. separates multiple subscripts (default 034)
  476. .TP
  477. .B ARGC
  478. argument count, assignable
  479. .TP
  480. .B ARGV
  481. argument array, assignable;
  482. non-null members are taken as file names
  483. .TP
  484. .B ENVIRON
  485. array of environment variables; subscripts are names.
  486. .PD
  487. .PP
  488. Functions may be defined (at the position of a pattern-action statement) thus:
  489. .IP
  490. .L
  491. function foo(a, b, c) { ...; return x }
  492. .PP
  493. Parameters are passed by value if scalar and by reference if array name;
  494. functions may be called recursively.
  495. Parameters are local to the function; all other variables are global.
  496. Thus local variables may be created by providing excess parameters in
  497. the function definition.
  498. .SH EXAMPLES
  499. .TP
  500. .L
  501. length($0) > 72
  502. Print lines longer than 72 characters.
  503. .TP
  504. .L
  505. { print $2, $1 }
  506. Print first two fields in opposite order.
  507. .PP
  508. .EX
  509. BEGIN { FS = ",[ \et]*|[ \et]+" }
  510. { print $2, $1 }
  511. .EE
  512. .ns
  513. .IP
  514. Same, with input fields separated by comma and/or blanks and tabs.
  515. .PP
  516. .EX
  517. { s += $1 }
  518. END { print "sum is", s, " average is", s/NR }
  519. .EE
  520. .ns
  521. .IP
  522. Add up first column, print sum and average.
  523. .TP
  524. .L
  525. /start/, /stop/
  526. Print all lines between start/stop pairs.
  527. .PP
  528. .EX
  529. BEGIN { # Simulate echo(1)
  530. for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
  531. printf "\en"
  532. exit }
  533. .EE
  534. .SH SOURCE
  535. .B /sys/src/cmd/awk
  536. .SH SEE ALSO
  537. .IR sed (1),
  538. .IR regexp (6),
  539. .br
  540. A. V. Aho, B. W. Kernighan, P. J. Weinberger,
  541. .I
  542. The AWK Programming Language,
  543. Addison-Wesley, 1988. ISBN 0-201-07981-X
  544. .SH BUGS
  545. There are no explicit conversions between numbers and strings.
  546. To force an expression to be treated as a number add 0 to it;
  547. to force it to be treated as a string concatenate
  548. \&\fL""\fP to it.
  549. .br
  550. The scope rules for variables in functions are a botch;
  551. the syntax is worse.
  552. .br
  553. UTF is not always dealt with correctly,
  554. though
  555. .I awk
  556. does make an attempt to do so.
  557. The
  558. .I split
  559. function with an empty string as final argument now copes
  560. with UTF in the string being split.