awk 10 KB


  1. .TH AWK 1
  2. .SH NAME
  3. awk \- pattern-directed scanning and processing language
  4. .SH SYNOPSIS
  5. .B awk
  6. [
  7. .BI -F fs
  8. ]
  9. [
  10. .BI -v
  11. .I var=value
  12. ]
  13. [
  14. .BI -mr n
  15. ]
  16. [
  17. .BI -mf n
  18. ]
  19. [
  20. .B -f
  21. .I prog
  22. [
  23. .I prog
  24. ]
  25. [
  26. .I file ...
  27. ]
  28. .SH DESCRIPTION
  29. .I Awk
  30. scans each input
  31. .I file
  32. for lines that match any of a set of patterns specified literally in
  33. .IR prog
  34. or in one or more files
  35. specified as
  36. .B -f
  37. .IR file .
  38. With each pattern
  39. there can be an associated action that will be performed
  40. when a line of a
  41. .I file
  42. matches the pattern.
  43. Each line is matched against the
  44. pattern portion of every pattern-action statement;
  45. the associated action is performed for each matched pattern.
  46. The file name
  47. .L -
  48. means the standard input.
  49. Any
  50. .IR file
  51. of the form
  52. .I var=value
  53. is treated as an assignment, not a file name,
  54. and is executed at the time it would have been opened if it were a file name.
  55. The option
  56. .B -v
  57. followed by
  58. .I var=value
  59. is an assignment to be done before
  60. .I prog
  61. is executed;
  62. any number of
  63. .B -v
  64. options may be present.
  65. .B \-F
  66. .IR fs
  67. option defines the input field separator to be the regular expression
  68. .IR fs .
  69. .PP
  70. An input line is normally made up of fields separated by white space,
  71. or by regular expression
  72. .BR FS .
  73. The fields are denoted
  74. .BR $1 ,
  75. .BR $2 ,
  76. \&..., while
  77. .B $0
  78. refers to the entire line.
  79. If
  80. .BR FS
  81. is null, the input line is split into one field per character.
  82. .PP
  83. To compensate for inadequate implementation of storage management,
  84. the
  85. .B \-mr
  86. option can be used to set the maximum size of the input record,
  87. and the
  88. .B \-mf
  89. option to set the maximum number of fields.
  90. .PP
  91. A pattern-action statement has the form
  92. .IP
  93. .IB pattern " { " action " }
  94. .PP
  95. A missing
  96. .BI { " action " }
  97. means print the line;
  98. a missing pattern always matches.
  99. Pattern-action statements are separated by newlines or semicolons.
  100. .PP
  101. An action is a sequence of statements.
  102. A statement can be one of the following:
  103. .PP
  104. .EX
  105. .ta \w'\fLdelete array[expression]'u
  106. if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
  107. while(\fI expression \fP)\fI statement\fP
  108. for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
  109. for(\fI var \fPin\fI array \fP)\fI statement\fP
  110. do\fI statement \fPwhile(\fI expression \fP)
  111. break
  112. continue
  113. {\fR [\fP\fI statement ... \fP\fR] \fP}
  114. \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
  115. print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  116. printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
  117. return\fR [ \fP\fIexpression \fP\fR]\fP
  118. next #\fR skip remaining patterns on this input line\fP
  119. nextfile #\fR skip rest of this file, open next, start at top\fP
  120. delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
  121. delete\fI array\fP #\fR delete all elements of array\fP
  122. exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
  123. .EE
  124. .DT
  125. .PP
  126. Statements are terminated by
  127. semicolons, newlines or right braces.
  128. An empty
  129. .I expression-list
  130. stands for
  131. .BR $0 .
  132. String constants are quoted \&\fL"\ "\fR,
  133. with the usual C escapes recognized within.
  134. Expressions take on string or numeric values as appropriate,
  135. and are built using the operators
  136. .B + \- * / % ^
  137. (exponentiation), and concatenation (indicated by white space).
  138. The operators
  139. .B
  140. ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
  141. are also available in expressions.
  142. Variables may be scalars, array elements
  143. (denoted
  144. .IB x [ i ] )
  145. or fields.
  146. Variables are initialized to the null string.
  147. Array subscripts may be any string,
  148. not necessarily numeric;
  149. this allows for a form of associative memory.
  150. Multiple subscripts such as
  151. .B [i,j,k]
  152. are permitted; the constituents are concatenated,
  153. separated by the value of
  154. .BR SUBSEP .
  155. .PP
  156. The
  157. .B print
  158. statement prints its arguments on the standard output
  159. (or on a file if
  160. .BI > file
  161. or
  162. .BI >> file
  163. is present or on a pipe if
  164. .BI | cmd
  165. is present), separated by the current output field separator,
  166. and terminated by the output record separator.
  167. .I file
  168. and
  169. .I cmd
  170. may be literal names or parenthesized expressions;
  171. identical string values in different statements denote
  172. the same open file.
  173. The
  174. .B printf
  175. statement formats its expression list according to the format
  176. (see
  177. .IR fprintf (2)) .
  178. The built-in function
  179. .BI close( expr )
  180. closes the file or pipe
  181. .IR expr .
  182. The built-in function
  183. .BI fflush( expr )
  184. flushes any buffered output for the file or pipe
  185. .IR expr .
  186. .PP
  187. The mathematical functions
  188. .BR exp ,
  189. .BR log ,
  190. .BR sqrt ,
  191. .BR sin ,
  192. .BR cos ,
  193. and
  194. .BR atan2
  195. are built in.
  196. Other built-in functions:
  197. .TF length
  198. .TP
  199. .B length
  200. the length of its argument
  201. taken as a string,
  202. or of
  203. .B $0
  204. if no argument.
  205. .TP
  206. .B rand
  207. random number on (0,1)
  208. .TP
  209. .B srand
  210. sets seed for
  211. .B rand
  212. and returns the previous seed.
  213. .TP
  214. .B int
  215. truncates to an integer value
  216. .TP
  217. .B utf
  218. converts its numerical argument, a character number, to a
  219. .SM UTF
  220. string
  221. .TP
  222. .BI substr( s , " m" , " n\fL)
  223. the
  224. .IR n -character
  225. substring of
  226. .I s
  227. that begins at position
  228. .IR m
  229. counted from 1.
  230. .TP
  231. .BI index( s , " t" )
  232. the position in
  233. .I s
  234. where the string
  235. .I t
  236. occurs, or 0 if it does not.
  237. .TP
  238. .BI match( s , " r" )
  239. the position in
  240. .I s
  241. where the regular expression
  242. .I r
  243. occurs, or 0 if it does not.
  244. The variables
  245. .B RSTART
  246. and
  247. .B RLENGTH
  248. are set to the position and length of the matched string.
  249. .TP
  250. .BI split( s , " a" , " fs\fL)
  251. splits the string
  252. .I s
  253. into array elements
  254. .IB a [1]\f1,
  255. .IB a [2]\f1,
  256. \&...,
  257. .IB a [ n ]\f1,
  258. and returns
  259. .IR n .
  260. The separation is done with the regular expression
  261. .I fs
  262. or with the field separator
  263. .B FS
  264. if
  265. .I fs
  266. is not given.
  267. An empty string as field separator splits the string
  268. into one array element per character.
  269. .TP
  270. .BI sub( r , " t" , " s\fL)
  271. substitutes
  272. .I t
  273. for the first occurrence of the regular expression
  274. .I r
  275. in the string
  276. .IR s .
  277. If
  278. .I s
  279. is not given,
  280. .B $0
  281. is used.
  282. .TP
  283. .B gsub
  284. same as
  285. .B sub
  286. except that all occurrences of the regular expression
  287. are replaced;
  288. .B sub
  289. and
  290. .B gsub
  291. return the number of replacements.
  292. .TP
  293. .BI sprintf( fmt , " expr" , " ...\fL)
  294. the string resulting from formatting
  295. .I expr ...
  296. according to the
  297. .I printf
  298. format
  299. .I fmt
  300. .TP
  301. .BI system( cmd )
  302. executes
  303. .I cmd
  304. and returns its exit status
  305. .TP
  306. .BI tolower( str )
  307. returns a copy of
  308. .I str
  309. with all upper-case characters translated to their
  310. corresponding lower-case equivalents.
  311. .TP
  312. .BI toupper( str )
  313. returns a copy of
  314. .I str
  315. with all lower-case characters translated to their
  316. corresponding upper-case equivalents.
  317. .PD
  318. .PP
  319. The ``function''
  320. .B getline
  321. sets
  322. .B $0
  323. to the next input record from the current input file;
  324. .B getline
  325. .BI < file
  326. sets
  327. .B $0
  328. to the next record from
  329. .IR file .
  330. .B getline
  331. .I x
  332. sets variable
  333. .I x
  334. instead.
  335. Finally,
  336. .IB cmd " | getline
  337. pipes the output of
  338. .I cmd
  339. into
  340. .BR getline ;
  341. each call of
  342. .B getline
  343. returns the next line of output from
  344. .IR cmd .
  345. In all cases,
  346. .B getline
  347. returns 1 for a successful input,
  348. 0 for end of file, and \-1 for an error.
  349. .PP
  350. Patterns are arbitrary Boolean combinations
  351. (with
  352. .BR "! || &&" )
  353. of regular expressions and
  354. relational expressions.
  355. Regular expressions are as in
  356. .IR regexp (6).
  357. Isolated regular expressions
  358. in a pattern apply to the entire line.
  359. Regular expressions may also occur in
  360. relational expressions, using the operators
  361. .BR ~
  362. and
  363. .BR !~ .
  364. .BI / re /
  365. is a constant regular expression;
  366. any string (constant or variable) may be used
  367. as a regular expression, except in the position of an isolated regular expression
  368. in a pattern.
  369. .PP
  370. A pattern may consist of two patterns separated by a comma;
  371. in this case, the action is performed for all lines
  372. from an occurrence of the first pattern
  373. though an occurrence of the second.
  374. .PP
  375. A relational expression is one of the following:
  376. .IP
  377. .I expression matchop regular-expression
  378. .br
  379. .I expression relop expression
  380. .br
  381. .IB expression " in " array-name
  382. .br
  383. .BI ( expr , expr,... ") in " array-name
  384. .PP
  385. where a
  386. .I relop
  387. is any of the six relational operators in C,
  388. and a
  389. .I matchop
  390. is either
  391. .B ~
  392. (matches)
  393. or
  394. .B !~
  395. (does not match).
  396. A conditional is an arithmetic expression,
  397. a relational expression,
  398. or a Boolean combination
  399. of these.
  400. .PP
  401. The special patterns
  402. .B BEGIN
  403. and
  404. .B END
  405. may be used to capture control before the first input line is read
  406. and after the last.
  407. .B BEGIN
  408. and
  409. .B END
  410. do not combine with other patterns.
  411. .PP
  412. Variable names with special meanings:
  413. .TF FILENAME
  414. .TP
  415. .B CONVFMT
  416. conversion format used when converting numbers
  417. (default
  418. .BR "%.6g" )
  419. .TP
  420. .B FS
  421. regular expression used to separate fields; also settable
  422. by option
  423. .BI \-F fs\f1.
  424. .TP
  425. .BR NF
  426. number of fields in the current record
  427. .TP
  428. .B NR
  429. ordinal number of the current record
  430. .TP
  431. .B FNR
  432. ordinal number of the current record in the current file
  433. .TP
  434. .B FILENAME
  435. the name of the current input file
  436. .TP
  437. .B RS
  438. input record separator (default newline)
  439. .TP
  440. .B OFS
  441. output field separator (default blank)
  442. .TP
  443. .B ORS
  444. output record separator (default newline)
  445. .TP
  446. .B OFMT
  447. output format for numbers (default
  448. .BR "%.6g" )
  449. .TP
  450. .B SUBSEP
  451. separates multiple subscripts (default 034)
  452. .TP
  453. .B ARGC
  454. argument count, assignable
  455. .TP
  456. .B ARGV
  457. argument array, assignable;
  458. non-null members are taken as file names
  459. .TP
  460. .B ENVIRON
  461. array of environment variables; subscripts are names.
  462. .PD
  463. .PP
  464. Functions may be defined (at the position of a pattern-action statement) thus:
  465. .IP
  466. .L
  467. function foo(a, b, c) { ...; return x }
  468. .PP
  469. Parameters are passed by value if scalar and by reference if array name;
  470. functions may be called recursively.
  471. Parameters are local to the function; all other variables are global.
  472. Thus local variables may be created by providing excess parameters in
  473. the function definition.
  474. .SH EXAMPLES
  475. .TP
  476. .L
  477. length($0) > 72
  478. Print lines longer than 72 characters.
  479. .TP
  480. .L
  481. { print $2, $1 }
  482. Print first two fields in opposite order.
  483. .PP
  484. .EX
  485. BEGIN { FS = ",[ \et]*|[ \et]+" }
  486. { print $2, $1 }
  487. .EE
  488. .ns
  489. .IP
  490. Same, with input fields separated by comma and/or blanks and tabs.
  491. .PP
  492. .EX
  493. { s += $1 }
  494. END { print "sum is", s, " average is", s/NR }
  495. .EE
  496. .ns
  497. .IP
  498. Add up first column, print sum and average.
  499. .TP
  500. .L
  501. /start/, /stop/
  502. Print all lines between start/stop pairs.
  503. .PP
  504. .EX
  505. BEGIN { # Simulate echo(1)
  506. for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
  507. printf "\en"
  508. exit }
  509. .EE
  510. .SH SOURCE
  511. .B /sys/src/cmd/awk
  512. .SH SEE ALSO
  513. .IR sed (1),
  514. .IR regexp (6),
  515. .br
  516. A. V. Aho, B. W. Kernighan, P. J. Weinberger,
  517. .I
  518. The AWK Programming Language,
  519. Addison-Wesley, 1988. ISBN 0-201-07981-X
  520. .SH BUGS
  521. There are no explicit conversions between numbers and strings.
  522. To force an expression to be treated as a number add 0 to it;
  523. to force it to be treated as a string concatenate
  524. \&\fL""\fP to it.
  525. .br
  526. The scope rules for variables in functions are a botch;
  527. the syntax is worse.