123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260 |
- .TH SORT 1
- .SH NAME
- sort \- sort and/or merge files
- .SH SYNOPSIS
- .B sort
- [
- .BI -cmuMbdf\&inrwt x
- ]
- [
- .BI + pos1
- [
- .BI - pos2
- ] ...
- ] ...
- [
- .B -k
- .I pos1
- [
- .I ,pos2
- ]
- ] ...
- .br
- \h'0.5in
- [
- .B -o
- .I output
- ]
- [
- .B -T
- .I dir
- \&...
- ]
- [
- .I option
- \&...
- ]
- [
- .I file
- \&...
- ]
- .SH DESCRIPTION
- .I Sort\^
- sorts
- lines of all the
- .I files
- together and writes the result on
- the standard output.
- If no input files are named, the standard input is sorted.
- .PP
- The default sort key is an entire line.
- Default ordering is
- lexicographic by runes.
- The ordering is affected globally by the following options,
- one or more of which may appear.
- .TP
- .B -M
- Compare as months.
- The first three
- non-white space characters
- of the field
- are folded
- to upper case
- and compared
- so that
- .L JAN
- precedes
- .LR FEB ,
- etc.
- Invalid fields
- compare low to
- .LR JAN .
- .TP
- .B -b
- Ignore leading white space (spaces and tabs) in field comparisons.
- .TP
- .B -d
- `Phone directory' order:
- only letters,
- accented letters,
- digits and white space
- are significant in comparisons.
- .TP
- .B -f
- Fold lower case
- letters onto upper case.
- Accented characters are folded to their
- non-accented upper case form.
- .TP
- .B -i
- Ignore characters outside the
- .SM ASCII
- range 040-0176
- in non-numeric comparisons.
- .TP
- .B -w
- Like
- .BR -i ,
- but ignore only tabs and spaces.
- .TP
- .B -n
- An initial numeric string,
- consisting of optional white space,
- optional plus or minus sign,
- and zero or more digits with optional decimal point,
- is sorted by arithmetic value.
- .TP
- .B -g
- Numbers, like
- .B -n
- but with optional
- .BR e -style
- exponents, are sorted by value.
- .TP
- .B -r
- Reverse the sense of comparisons.
- .TP
- .BI -t x\^
- `Tab character' separating fields is
- .IR x .
- .PP
- The notation
- .BI + "pos1\| " - pos2\^
- restricts a sort key to a field beginning at
- .I pos1\^
- and ending just before
- .IR pos2 .
- .I Pos1\^
- and
- .I pos2\^
- each have the form
- .IB m . n\f1,
- optionally followed by one or more of the flags
- .BR Mbdfginr ,
- where
- .I m\^
- tells a number of fields to skip from the beginning of the line and
- .I n\^
- tells a number of characters to skip further.
- If any flags are present they override all the global
- ordering options for this key.
- A missing
- .BI \&. n\^
- means
- .BR \&.0 ;
- a missing
- .BI - pos2\^
- means the end of the line.
- Under the
- .BI -t x\^
- option, fields are strings separated by
- .IR x ;
- otherwise fields are
- non-empty strings separated by white space.
- White space before a field
- is part of the field, except under option
- .BR -b .
- A
- .B b
- flag may be attached independently to
- .IR pos1
- and
- .IR pos2.
- .PP
- The notation
- .B -k
- .IR pos1 [, pos2 ]
- is how POSIX
- .I sort
- defines fields:
- .I pos1
- and
- .I pos2
- have the same format but different meanings.
- The value of
- .I m\^
- is origin 1 instead of origin 0
- and a missing
- .BI \&. n\^
- in
- .I pos2
- is the end of the field.
- .PP
- When there are multiple sort keys, later keys
- are compared only after all earlier keys
- compare equal.
- Lines that otherwise compare equal are ordered
- with all bytes significant.
- .PP
- These option arguments are also understood:
- .TP \w'\fL-z\fIrecsize\fLXX'u
- .B -c
- Check that the single input file is sorted according to the ordering rules;
- give no output unless the file is out of sort.
- .TP
- .B -m
- Merge; assume the input files are already sorted.
- .TP
- .B -u
- Suppress all but one in each
- set of equal lines.
- Ignored bytes
- and bytes outside keys
- do not participate in
- this comparison.
- .TP
- .B -o
- The next argument is the name of an output file
- to use instead of the standard output.
- This file may be the same as one of the inputs.
- .TP
- .BI -T dir
- Put temporary files in
- .I dir
- rather than in
- .BR /tmp .
- .ne 4
- .SH EXAMPLES
- .TP
- .L sort -u +0f +0 list
- Print in alphabetical order all the unique spellings
- in a list of words
- where capitalized words differ from uncapitalized.
- .TP
- .L sort -t: +1 /adm/users
- Print the users file
- sorted by user name
- (the second colon-separated field).
- .TP
- .L sort -umM dates
- Print the first instance of each month in an already sorted file.
- Options
- .B -um
- with just one input file make the choice of a
- unique representative from a set of equal lines predictable.
- .TP
- .L
- grep -n '^' input | sort -t: +1f +0n | sed 's/[0-9]*://'
- A stable sort: input lines that compare equal will
- come out in their original order.
- .SH FILES
- .BI /tmp/sort. <pid>.<ordinal>
- .SH SOURCE
- .B /sys/src/cmd/sort.c
- .SH SEE ALSO
- .IR uniq (1),
- .IR look (1)
- .SH DIAGNOSTICS
- .I Sort
- comments and exits with non-null status for various trouble
- conditions and for disorder discovered under option
- .BR -c .
- .SH BUGS
- An external null character can be confused
- with an internally generated end-of-field character.
- The result can make a sub-field not sort
- less than a longer field.
- .PP
- Some of the options, e.g.
- .BR -M ,
- are hopelessly provincial.
|