123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560 |
- .TH AWK 1
- .SH NAME
- awk \- pattern-directed scanning and processing language
- .SH SYNOPSIS
- .B awk
- [
- .B -F
- .I fs
- ]
- [
- .B -d
- ]
- [
- .BI -mf
- .I n
- ]
- [
- .B -mr
- .I n
- ]
- [
- .B -safe
- ]
- [
- .B -v
- .I var=value
- ]
- [
- .B -f
- .I progfile
- |
- .I prog
- ]
- [
- .I file ...
- ]
- .SH DESCRIPTION
- .I Awk
- scans each input
- .I file
- for lines that match any of a set of patterns specified literally in
- .I prog
- or in one or more files
- specified as
- .B -f
- .IR progfile .
- With each pattern
- there can be an associated action that will be performed
- when a line of a
- .I file
- matches the pattern.
- Each line is matched against the
- pattern portion of every pattern-action statement;
- the associated action is performed for each matched pattern.
- The file name
- .L -
- means the standard input.
- Any
- .IR file
- of the form
- .I var=value
- is treated as an assignment, not a file name,
- and is executed at the time it would have been opened if it were a file name.
- The option
- .B -v
- followed by
- .I var=value
- is an assignment to be done before the program
- is executed;
- any number of
- .B -v
- options may be present.
- .B -F
- .IR fs
- option defines the input field separator to be the regular expression
- .IR fs .
- .PP
- An input line is normally made up of fields separated by white space,
- or by regular expression
- .BR FS .
- The fields are denoted
- .BR $1 ,
- .BR $2 ,
- \&..., while
- .B $0
- refers to the entire line.
- If
- .BR FS
- is null, the input line is split into one field per character.
- .PP
- To compensate for inadequate implementation of storage management,
- the
- .B -mr
- option can be used to set the maximum size of the input record,
- and the
- .B -mf
- option to set the maximum number of fields.
- .PP
- The
- .B -safe
- option causes
- .I awk
- to run in
- ``safe mode,''
- in which it is not allowed to
- run shell commands or open files
- and the environment is not made available
- in the
- .B ENVIRON
- variable.
- .PP
- A pattern-action statement has the form
- .IP
- .IB pattern " { " action " }
- .PP
- A missing
- .BI { " action " }
- means print the line;
- a missing pattern always matches.
- Pattern-action statements are separated by newlines or semicolons.
- .PP
- An action is a sequence of statements.
- A statement can be one of the following:
- .PP
- .EX
- .ta \w'\fLdelete array[expression]'u
- if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
- while(\fI expression \fP)\fI statement\fP
- for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
- for(\fI var \fPin\fI array \fP)\fI statement\fP
- do\fI statement \fPwhile(\fI expression \fP)
- break
- continue
- {\fR [\fP\fI statement ... \fP\fR] \fP}
- \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
- print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
- printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
- return\fR [ \fP\fIexpression \fP\fR]\fP
- next #\fR skip remaining patterns on this input line\fP
- nextfile #\fR skip rest of this file, open next, start at top\fP
- delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
- delete\fI array\fP #\fR delete all elements of array\fP
- exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
- .EE
- .DT
- .PP
- Statements are terminated by
- semicolons, newlines or right braces.
- An empty
- .I expression-list
- stands for
- .BR $0 .
- String constants are quoted \&\fL"\ "\fR,
- with the usual C escapes recognized within.
- Expressions take on string or numeric values as appropriate,
- and are built using the operators
- .B + \- * / % ^
- (exponentiation), and concatenation (indicated by white space).
- The operators
- .B
- ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
- are also available in expressions.
- Variables may be scalars, array elements
- (denoted
- .IB x [ i ] )
- or fields.
- Variables are initialized to the null string.
- Array subscripts may be any string,
- not necessarily numeric;
- this allows for a form of associative memory.
- Multiple subscripts such as
- .B [i,j,k]
- are permitted; the constituents are concatenated,
- separated by the value of
- .BR SUBSEP .
- .PP
- The
- .B print
- statement prints its arguments on the standard output
- (or on a file if
- .BI > file
- or
- .BI >> file
- is present or on a pipe if
- .BI | cmd
- is present), separated by the current output field separator,
- and terminated by the output record separator.
- .I file
- and
- .I cmd
- may be literal names or parenthesized expressions;
- identical string values in different statements denote
- the same open file.
- The
- .B printf
- statement formats its expression list according to the format
- (see
- .IR fprintf (2)) .
- The built-in function
- .BI close( expr )
- closes the file or pipe
- .IR expr .
- The built-in function
- .BI fflush( expr )
- flushes any buffered output for the file or pipe
- .IR expr .
- If
- .IR expr
- is omitted or is a null string, all open files are flushed.
- .PP
- The mathematical functions
- .BR exp ,
- .BR log ,
- .BR sqrt ,
- .BR sin ,
- .BR cos ,
- and
- .BR atan2
- are built in.
- Other built-in functions:
- .TF length
- .TP
- .B length
- If its argument is a string, the string's length is returned.
- If its argument is an array, the number of subscripts in the array is returned.
- If no argument, the length of
- .B $0
- is returned.
- .TP
- .B rand
- random number on (0,1)
- .TP
- .B srand
- sets seed for
- .B rand
- and returns the previous seed.
- .TP
- .B int
- truncates to an integer value
- .TP
- .B utf
- converts its numerical argument, a character number, to a
- .SM UTF
- string
- .TP
- .BI substr( s , " m" , " n\fL)
- the
- .IR n -character
- substring of
- .I s
- that begins at position
- .IR m
- counted from 1.
- .TP
- .BI index( s , " t" )
- the position in
- .I s
- where the string
- .I t
- occurs, or 0 if it does not.
- .TP
- .BI match( s , " r" )
- the position in
- .I s
- where the regular expression
- .I r
- occurs, or 0 if it does not.
- The variables
- .B RSTART
- and
- .B RLENGTH
- are set to the position and length of the matched string.
- .TP
- .BI split( s , " a" , " fs\fL)
- splits the string
- .I s
- into array elements
- .IB a [1]\f1,
- .IB a [2]\f1,
- \&...,
- .IB a [ n ]\f1,
- and returns
- .IR n .
- The separation is done with the regular expression
- .I fs
- or with the field separator
- .B FS
- if
- .I fs
- is not given.
- An empty string as field separator splits the string
- into one array element per character.
- .TP
- .BI sub( r , " t" , " s\fL)
- substitutes
- .I t
- for the first occurrence of the regular expression
- .I r
- in the string
- .IR s .
- If
- .I s
- is not given,
- .B $0
- is used.
- .TP
- .B gsub
- same as
- .B sub
- except that all occurrences of the regular expression
- are replaced;
- .B sub
- and
- .B gsub
- return the number of replacements.
- .TP
- .BI sprintf( fmt , " expr" , " ...\fL)
- the string resulting from formatting
- .I expr ...
- according to the
- .I printf
- format
- .I fmt
- .TP
- .BI system( cmd )
- executes
- .I cmd
- and returns its exit status
- .TP
- .BI tolower( str )
- returns a copy of
- .I str
- with all upper-case characters translated to their
- corresponding lower-case equivalents.
- .TP
- .BI toupper( str )
- returns a copy of
- .I str
- with all lower-case characters translated to their
- corresponding upper-case equivalents.
- .PD
- .PP
- The ``function''
- .B getline
- sets
- .B $0
- to the next input record from the current input file;
- .B getline
- .BI < file
- sets
- .B $0
- to the next record from
- .IR file .
- .B getline
- .I x
- sets variable
- .I x
- instead.
- Finally,
- .IB cmd " | getline
- pipes the output of
- .I cmd
- into
- .BR getline ;
- each call of
- .B getline
- returns the next line of output from
- .IR cmd .
- In all cases,
- .B getline
- returns 1 for a successful input,
- 0 for end of file, and \-1 for an error.
- .PP
- Patterns are arbitrary Boolean combinations
- (with
- .BR "! || &&" )
- of regular expressions and
- relational expressions.
- Regular expressions are as in
- .IR regexp (6).
- Isolated regular expressions
- in a pattern apply to the entire line.
- Regular expressions may also occur in
- relational expressions, using the operators
- .BR ~
- and
- .BR !~ .
- .BI / re /
- is a constant regular expression;
- any string (constant or variable) may be used
- as a regular expression, except in the position of an isolated regular expression
- in a pattern.
- .PP
- A pattern may consist of two patterns separated by a comma;
- in this case, the action is performed for all lines
- from an occurrence of the first pattern
- though an occurrence of the second.
- .PP
- A relational expression is one of the following:
- .IP
- .I expression matchop regular-expression
- .br
- .I expression relop expression
- .br
- .IB expression " in " array-name
- .br
- .BI ( expr , expr,... ") in " array-name
- .PP
- where a
- .I relop
- is any of the six relational operators in C,
- and a
- .I matchop
- is either
- .B ~
- (matches)
- or
- .B !~
- (does not match).
- A conditional is an arithmetic expression,
- a relational expression,
- or a Boolean combination
- of these.
- .PP
- The special patterns
- .B BEGIN
- and
- .B END
- may be used to capture control before the first input line is read
- and after the last.
- .B BEGIN
- and
- .B END
- do not combine with other patterns.
- .PP
- Variable names with special meanings:
- .TF FILENAME
- .TP
- .B CONVFMT
- conversion format used when converting numbers
- (default
- .BR "%.6g" )
- .TP
- .B FS
- regular expression used to separate fields; also settable
- by option
- .BI \-F fs\f1.
- .TP
- .BR NF
- number of fields in the current record
- .TP
- .B NR
- ordinal number of the current record
- .TP
- .B FNR
- ordinal number of the current record in the current file
- .TP
- .B FILENAME
- the name of the current input file
- .TP
- .B RS
- input record separator (default newline)
- .TP
- .B OFS
- output field separator (default blank)
- .TP
- .B ORS
- output record separator (default newline)
- .TP
- .B OFMT
- output format for numbers (default
- .BR "%.6g" )
- .TP
- .B SUBSEP
- separates multiple subscripts (default 034)
- .TP
- .B ARGC
- argument count, assignable
- .TP
- .B ARGV
- argument array, assignable;
- non-null members are taken as file names
- .TP
- .B ENVIRON
- array of environment variables; subscripts are names.
- .PD
- .PP
- Functions may be defined (at the position of a pattern-action statement) thus:
- .IP
- .L
- function foo(a, b, c) { ...; return x }
- .PP
- Parameters are passed by value if scalar and by reference if array name;
- functions may be called recursively.
- Parameters are local to the function; all other variables are global.
- Thus local variables may be created by providing excess parameters in
- the function definition.
- .SH EXAMPLES
- .TP
- .L
- length($0) > 72
- Print lines longer than 72 characters.
- .TP
- .L
- { print $2, $1 }
- Print first two fields in opposite order.
- .PP
- .EX
- BEGIN { FS = ",[ \et]*|[ \et]+" }
- { print $2, $1 }
- .EE
- .ns
- .IP
- Same, with input fields separated by comma and/or blanks and tabs.
- .PP
- .EX
- { s += $1 }
- END { print "sum is", s, " average is", s/NR }
- .EE
- .ns
- .IP
- Add up first column, print sum and average.
- .TP
- .L
- /start/, /stop/
- Print all lines between start/stop pairs.
- .PP
- .EX
- BEGIN { # Simulate echo(1)
- for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
- printf "\en"
- exit }
- .EE
- .SH SOURCE
- .B /sys/src/cmd/awk
- .SH SEE ALSO
- .IR sed (1),
- .IR regexp (6),
- .br
- A. V. Aho, B. W. Kernighan, P. J. Weinberger,
- .I
- The AWK Programming Language,
- Addison-Wesley, 1988. ISBN 0-201-07981-X
- .SH BUGS
- There are no explicit conversions between numbers and strings.
- To force an expression to be treated as a number add 0 to it;
- to force it to be treated as a string concatenate
- \&\fL""\fP to it.
- .br
- The scope rules for variables in functions are a botch;
- the syntax is worse.
- .br
- UTF is not always dealt with correctly,
- though
- .I awk
- does make an attempt to do so.
- The
- .I split
- function with an empty string as final argument now copes
- with UTF in the string being split.
|