awk 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548
  1. .TH AWK 1
  2. .SH NAME
  3. awk \- pattern-directed scanning and processing language
  4. .SH SYNOPSIS
  5. .B awk
  6. [
  7. .B -F
  8. .I fs
  9. ]
  10. [
  11. .B -d
  12. ]
  13. [
  14. .BI -mf
  15. .I n
  16. ]
  17. [
  18. .B -mr
  19. .I n
  20. ]
  21. [
  22. .B -safe
  23. ]
  24. [
  25. .B -v
  26. .I var=value
  27. ]
  28. [
  29. .B -f
  30. .I progfile
  31. |
  32. .I prog
  33. ]
  34. [
  35. .I file ...
  36. ]
  37. .SH DESCRIPTION
  38. .I Awk
  39. scans each input
  40. .I file
  41. for lines that match any of a set of patterns specified literally in
  42. .I prog
  43. or in one or more files
  44. specified as
  45. .B -f
  46. .IR progfile .
  47. With each pattern
  48. there can be an associated action that will be performed
  49. when a line of a
  50. .I file
  51. matches the pattern.
  52. Each line is matched against the
  53. pattern portion of every pattern-action statement;
  54. the associated action is performed for each matched pattern.
  55. The file name
  56. .L -
  57. means the standard input.
  58. Any
  59. .IR file
  60. of the form
  61. .I var=value
  62. is treated as an assignment, not a file name,
  63. and is executed at the time it would have been opened if it were a file name.
  64. The option
  65. .B -v
  66. followed by
  67. .I var=value
  68. is an assignment to be done before the program
  69. is executed;
  70. any number of
  71. .B -v
  72. options may be present.
  73. .B -F
  74. .IR fs
  75. option defines the input field separator to be the regular expression
  76. .IR fs .
  77. .PP
  78. An input line is normally made up of fields separated by white space,
  79. or by regular expression
  80. .BR FS .
  81. The fields are denoted
  82. .BR $1 ,
  83. .BR $2 ,
  84. \&..., while
  85. .B $0
  86. refers to the entire line.
  87. If
  88. .BR FS
  89. is null, the input line is split into one field per character.
  90. .PP
  91. To compensate for inadequate implementation of storage management,
  92. the
  93. .B -mr
  94. option can be used to set the maximum size of the input record,
  95. and the
  96. .B -mf
  97. option to set the maximum number of fields.
  98. .PP
  99. The
  100. .B -safe
  101. option causes
  102. .I awk
  103. to run in
  104. ``safe mode,''
  105. in which it is not allowed to
  106. run shell commands or open files
  107. and the environment is not made available
  108. in the
  109. .B ENVIRON
  110. variable.
  111. .PP
  112. A pattern-action statement has the form
  113. .IP
  114. .IB pattern " { " action " }
  115. .PP
  116. A missing
  117. .BI { " action " }
  118. means print the line;
  119. a missing pattern always matches.
  120. Pattern-action statements are separated by newlines or semicolons.
  121. .PP
  122. An action is a sequence of statements.
  123. A statement can be one of the following:
  124. .PP
  125. .EX
  126. .ta \w'\fLdelete array[expression]'u
  127. if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
  128. while(\fI expression \fP)\fI statement\fP
  129. for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
  130. for(\fI var \fPin\fI array \fP)\fI statement\fP
  131. do\fI statement \fPwhile(\fI expression \fP)
  132. break
  133. continue
  134. {\fR [\fP\fI statement ... \fP\fR] \fP}
  135. \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
  136. print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  137. printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  138. return\fR [ \fP\fIexpression \fP\fR]\fP
  139. next #\fR skip remaining patterns on this input line\fP
  140. nextfile #\fR skip rest of this file, open next, start at top\fP
  141. delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
  142. delete\fI array\fP #\fR delete all elements of array\fP
  143. exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
  144. .EE
  145. .DT
  146. .PP
  147. Statements are terminated by
  148. semicolons, newlines or right braces.
  149. An empty
  150. .I expression-list
  151. stands for
  152. .BR $0 .
  153. String constants are quoted \&\fL"\ "\fR,
  154. with the usual C escapes recognized within.
  155. Expressions take on string or numeric values as appropriate,
  156. and are built using the operators
  157. .B + \- * / % ^
  158. (exponentiation), and concatenation (indicated by white space).
  159. The operators
  160. .B
  161. ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
  162. are also available in expressions.
  163. Variables may be scalars, array elements
  164. (denoted
  165. .IB x [ i ] )
  166. or fields.
  167. Variables are initialized to the null string.
  168. Array subscripts may be any string,
  169. not necessarily numeric;
  170. this allows for a form of associative memory.
  171. Multiple subscripts such as
  172. .B [i,j,k]
  173. are permitted; the constituents are concatenated,
  174. separated by the value of
  175. .BR SUBSEP .
  176. .PP
  177. The
  178. .B print
  179. statement prints its arguments on the standard output
  180. (or on a file if
  181. .BI > file
  182. or
  183. .BI >> file
  184. is present or on a pipe if
  185. .BI | cmd
  186. is present), separated by the current output field separator,
  187. and terminated by the output record separator.
  188. .I file
  189. and
  190. .I cmd
  191. may be literal names or parenthesized expressions;
  192. identical string values in different statements denote
  193. the same open file.
  194. The
  195. .B printf
  196. statement formats its expression list according to the format
  197. (see
  198. .IR fprintf (2)) .
  199. The built-in function
  200. .BI close( expr )
  201. closes the file or pipe
  202. .IR expr .
  203. The built-in function
  204. .BI fflush( expr )
  205. flushes any buffered output for the file or pipe
  206. .IR expr .
  207. .PP
  208. The mathematical functions
  209. .BR exp ,
  210. .BR log ,
  211. .BR sqrt ,
  212. .BR sin ,
  213. .BR cos ,
  214. and
  215. .BR atan2
  216. are built in.
  217. Other built-in functions:
  218. .TF length
  219. .TP
  220. .B length
  221. the length of its argument
  222. taken as a string,
  223. or of
  224. .B $0
  225. if no argument.
  226. .TP
  227. .B rand
  228. random number on (0,1)
  229. .TP
  230. .B srand
  231. sets seed for
  232. .B rand
  233. and returns the previous seed.
  234. .TP
  235. .B int
  236. truncates to an integer value
  237. .TP
  238. .B utf
  239. converts its numerical argument, a character number, to a
  240. .SM UTF
  241. string
  242. .TP
  243. .BI substr( s , " m" , " n\fL)
  244. the
  245. .IR n -character
  246. substring of
  247. .I s
  248. that begins at position
  249. .IR m
  250. counted from 1.
  251. .TP
  252. .BI index( s , " t" )
  253. the position in
  254. .I s
  255. where the string
  256. .I t
  257. occurs, or 0 if it does not.
  258. .TP
  259. .BI match( s , " r" )
  260. the position in
  261. .I s
  262. where the regular expression
  263. .I r
  264. occurs, or 0 if it does not.
  265. The variables
  266. .B RSTART
  267. and
  268. .B RLENGTH
  269. are set to the position and length of the matched string.
  270. .TP
  271. .BI split( s , " a" , " fs\fL)
  272. splits the string
  273. .I s
  274. into array elements
  275. .IB a [1]\f1,
  276. .IB a [2]\f1,
  277. \&...,
  278. .IB a [ n ]\f1,
  279. and returns
  280. .IR n .
  281. The separation is done with the regular expression
  282. .I fs
  283. or with the field separator
  284. .B FS
  285. if
  286. .I fs
  287. is not given.
  288. An empty string as field separator splits the string
  289. into one array element per character.
  290. .TP
  291. .BI sub( r , " t" , " s\fL)
  292. substitutes
  293. .I t
  294. for the first occurrence of the regular expression
  295. .I r
  296. in the string
  297. .IR s .
  298. If
  299. .I s
  300. is not given,
  301. .B $0
  302. is used.
  303. .TP
  304. .B gsub
  305. same as
  306. .B sub
  307. except that all occurrences of the regular expression
  308. are replaced;
  309. .B sub
  310. and
  311. .B gsub
  312. return the number of replacements.
  313. .TP
  314. .BI sprintf( fmt , " expr" , " ...\fL)
  315. the string resulting from formatting
  316. .I expr ...
  317. according to the
  318. .I printf
  319. format
  320. .I fmt
  321. .TP
  322. .BI system( cmd )
  323. executes
  324. .I cmd
  325. and returns its exit status
  326. .TP
  327. .BI tolower( str )
  328. returns a copy of
  329. .I str
  330. with all upper-case characters translated to their
  331. corresponding lower-case equivalents.
  332. .TP
  333. .BI toupper( str )
  334. returns a copy of
  335. .I str
  336. with all lower-case characters translated to their
  337. corresponding upper-case equivalents.
  338. .PD
  339. .PP
  340. The ``function''
  341. .B getline
  342. sets
  343. .B $0
  344. to the next input record from the current input file;
  345. .B getline
  346. .BI < file
  347. sets
  348. .B $0
  349. to the next record from
  350. .IR file .
  351. .B getline
  352. .I x
  353. sets variable
  354. .I x
  355. instead.
  356. Finally,
  357. .IB cmd " | getline
  358. pipes the output of
  359. .I cmd
  360. into
  361. .BR getline ;
  362. each call of
  363. .B getline
  364. returns the next line of output from
  365. .IR cmd .
  366. In all cases,
  367. .B getline
  368. returns 1 for a successful input,
  369. 0 for end of file, and \-1 for an error.
  370. .PP
  371. Patterns are arbitrary Boolean combinations
  372. (with
  373. .BR "! || &&" )
  374. of regular expressions and
  375. relational expressions.
  376. Regular expressions are as in
  377. .IR regexp (6).
  378. Isolated regular expressions
  379. in a pattern apply to the entire line.
  380. Regular expressions may also occur in
  381. relational expressions, using the operators
  382. .BR ~
  383. and
  384. .BR !~ .
  385. .BI / re /
  386. is a constant regular expression;
  387. any string (constant or variable) may be used
  388. as a regular expression, except in the position of an isolated regular expression
  389. in a pattern.
  390. .PP
  391. A pattern may consist of two patterns separated by a comma;
  392. in this case, the action is performed for all lines
  393. from an occurrence of the first pattern
  394. though an occurrence of the second.
  395. .PP
  396. A relational expression is one of the following:
  397. .IP
  398. .I expression matchop regular-expression
  399. .br
  400. .I expression relop expression
  401. .br
  402. .IB expression " in " array-name
  403. .br
  404. .BI ( expr , expr,... ") in " array-name
  405. .PP
  406. where a
  407. .I relop
  408. is any of the six relational operators in C,
  409. and a
  410. .I matchop
  411. is either
  412. .B ~
  413. (matches)
  414. or
  415. .B !~
  416. (does not match).
  417. A conditional is an arithmetic expression,
  418. a relational expression,
  419. or a Boolean combination
  420. of these.
  421. .PP
  422. The special patterns
  423. .B BEGIN
  424. and
  425. .B END
  426. may be used to capture control before the first input line is read
  427. and after the last.
  428. .B BEGIN
  429. and
  430. .B END
  431. do not combine with other patterns.
  432. .PP
  433. Variable names with special meanings:
  434. .TF FILENAME
  435. .TP
  436. .B CONVFMT
  437. conversion format used when converting numbers
  438. (default
  439. .BR "%.6g" )
  440. .TP
  441. .B FS
  442. regular expression used to separate fields; also settable
  443. by option
  444. .BI \-F fs\f1.
  445. .TP
  446. .BR NF
  447. number of fields in the current record
  448. .TP
  449. .B NR
  450. ordinal number of the current record
  451. .TP
  452. .B FNR
  453. ordinal number of the current record in the current file
  454. .TP
  455. .B FILENAME
  456. the name of the current input file
  457. .TP
  458. .B RS
  459. input record separator (default newline)
  460. .TP
  461. .B OFS
  462. output field separator (default blank)
  463. .TP
  464. .B ORS
  465. output record separator (default newline)
  466. .TP
  467. .B OFMT
  468. output format for numbers (default
  469. .BR "%.6g" )
  470. .TP
  471. .B SUBSEP
  472. separates multiple subscripts (default 034)
  473. .TP
  474. .B ARGC
  475. argument count, assignable
  476. .TP
  477. .B ARGV
  478. argument array, assignable;
  479. non-null members are taken as file names
  480. .TP
  481. .B ENVIRON
  482. array of environment variables; subscripts are names.
  483. .PD
  484. .PP
  485. Functions may be defined (at the position of a pattern-action statement) thus:
  486. .IP
  487. .L
  488. function foo(a, b, c) { ...; return x }
  489. .PP
  490. Parameters are passed by value if scalar and by reference if array name;
  491. functions may be called recursively.
  492. Parameters are local to the function; all other variables are global.
  493. Thus local variables may be created by providing excess parameters in
  494. the function definition.
  495. .SH EXAMPLES
  496. .TP
  497. .L
  498. length($0) > 72
  499. Print lines longer than 72 characters.
  500. .TP
  501. .L
  502. { print $2, $1 }
  503. Print first two fields in opposite order.
  504. .PP
  505. .EX
  506. BEGIN { FS = ",[ \et]*|[ \et]+" }
  507. { print $2, $1 }
  508. .EE
  509. .ns
  510. .IP
  511. Same, with input fields separated by comma and/or blanks and tabs.
  512. .PP
  513. .EX
  514. { s += $1 }
  515. END { print "sum is", s, " average is", s/NR }
  516. .EE
  517. .ns
  518. .IP
  519. Add up first column, print sum and average.
  520. .TP
  521. .L
  522. /start/, /stop/
  523. Print all lines between start/stop pairs.
  524. .PP
  525. .EX
  526. BEGIN { # Simulate echo(1)
  527. for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
  528. printf "\en"
  529. exit }
  530. .EE
  531. .SH SOURCE
  532. .B /sys/src/cmd/awk
  533. .SH SEE ALSO
  534. .IR sed (1),
  535. .IR regexp (6),
  536. .br
  537. A. V. Aho, B. W. Kernighan, P. J. Weinberger,
  538. .I
  539. The AWK Programming Language,
  540. Addison-Wesley, 1988. ISBN 0-201-07981-X
  541. .SH BUGS
  542. There are no explicit conversions between numbers and strings.
  543. To force an expression to be treated as a number add 0 to it;
  544. to force it to be treated as a string concatenate
  545. \&\fL""\fP to it.
  546. .br
  547. The scope rules for variables in functions are a botch;
  548. the syntax is worse.