sort 4.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262
  1. .TH SORT 1
  2. .SH NAME
  3. sort \- sort and/or merge files
  4. .SH SYNOPSIS
  5. .B sort
  6. [
  7. .BI -cmuMbdf\&inrwt x
  8. ]
  9. [
  10. .BI + pos1
  11. [
  12. .BI - pos2
  13. ] ...
  14. ] ...
  15. [
  16. .B -k
  17. .I pos1
  18. [
  19. .I ,pos2
  20. ]
  21. ] ...
  22. .br
  23. \h'0.5in
  24. [
  25. .B -o
  26. .I output
  27. ]
  28. [
  29. .B -T
  30. .I dir
  31. \&...
  32. ]
  33. [
  34. .I option
  35. \&...
  36. ]
  37. [
  38. .I file
  39. \&...
  40. ]
  41. .SH DESCRIPTION
  42. .I Sort\^
  43. sorts
  44. lines of all the
  45. .I files
  46. together and writes the result on
  47. the standard output.
  48. If no input files are named, the standard input is sorted.
  49. .PP
  50. The default sort key is an entire line.
  51. Default ordering is
  52. lexicographic by runes.
  53. The ordering is affected globally by the following options,
  54. one or more of which may appear.
  55. .TP
  56. .B -M
  57. Compare as months.
  58. The first three
  59. non-white space characters
  60. of the field
  61. are folded
  62. to upper case
  63. and compared
  64. so that
  65. .L JAN
  66. precedes
  67. .LR FEB ,
  68. etc.
  69. Invalid fields
  70. compare low to
  71. .LR JAN .
  72. .TP
  73. .B -b
  74. Ignore leading white space (spaces and tabs) in field comparisons.
  75. .TP
  76. .B -d
  77. `Phone directory' order:
  78. only letters,
  79. accented letters,
  80. digits and white space
  81. are significant in comparisons.
  82. .TP
  83. .B -f
  84. Fold lower case
  85. letters onto upper case.
  86. Accented characters are folded to their
  87. non-accented upper case form.
  88. .TP
  89. .B -i
  90. Ignore characters outside the
  91. .SM ASCII
  92. range 040-0176
  93. in non-numeric comparisons.
  94. .TP
  95. .B -w
  96. Like
  97. .BR -i ,
  98. but ignore only tabs and spaces.
  99. .TP
  100. .B -n
  101. An initial numeric string,
  102. consisting of optional white space,
  103. optional plus or minus sign,
  104. and zero or more digits with optional decimal point,
  105. is sorted by arithmetic value.
  106. .TP
  107. .B -g
  108. Numbers, like
  109. .B -n
  110. but with optional
  111. .BR e -style
  112. exponents, are sorted by value.
  113. .TP
  114. .B -r
  115. Reverse the sense of comparisons.
  116. .TP
  117. .BI -t x\^
  118. `Tab character' separating fields is
  119. .IR x .
  120. .PP
  121. The notation
  122. .BI + "pos1\| " - pos2\^
  123. restricts a sort key to a field beginning at
  124. .I pos1\^
  125. and ending just before
  126. .IR pos2 .
  127. .I Pos1\^
  128. and
  129. .I pos2\^
  130. each have the form
  131. .IB m . n\f1,
  132. optionally followed by one or more of the flags
  133. .BR Mbdfginr ,
  134. where
  135. .I m\^
  136. tells a number of fields to skip from the beginning of the line and
  137. .I n\^
  138. tells a number of characters to skip further.
  139. If any flags are present they override all the global
  140. ordering options for this key.
  141. A missing
  142. .BI \&. n\^
  143. means
  144. .BR \&.0 ;
  145. a missing
  146. .BI - pos2\^
  147. means the end of the line.
  148. Under the
  149. .BI -t x\^
  150. option, fields are strings separated by
  151. .IR x ;
  152. otherwise fields are
  153. non-empty strings separated by white space.
  154. White space before a field
  155. is part of the field, except under option
  156. .BR -b .
  157. A
  158. .B b
  159. flag may be attached independently to
  160. .IR pos1
  161. and
  162. .IR pos2.
  163. .PP
  164. The notation
  165. .B -k
  166. .IR pos1 [, pos2 ]
  167. is how POSIX
  168. .I sort
  169. defines fields:
  170. .I pos1
  171. and
  172. .I pos2
  173. have the same format but different meanings.
  174. The value of
  175. .I m\^
  176. is origin 1 instead of origin 0
  177. and a missing
  178. .BI \&. n\^
  179. in
  180. .I pos2
  181. is the end of the field.
  182. .PP
  183. When there are multiple sort keys, later keys
  184. are compared only after all earlier keys
  185. compare equal.
  186. Lines that otherwise compare equal are ordered
  187. with all bytes significant.
  188. .PP
  189. These option arguments are also understood:
  190. .TP \w'\fL-z\fIrecsize\fLXX'u
  191. .B -c
  192. Check that the single input file is sorted according to the ordering rules;
  193. give no output unless the file is out of sort.
  194. .TP
  195. .B -m
  196. Merge; assume the input files are already sorted.
  197. .TP
  198. .B -u
  199. Suppress all but one in each
  200. set of equal lines.
  201. Ignored bytes
  202. and bytes outside keys
  203. do not participate in
  204. this comparison.
  205. .TP
  206. .B -o
  207. The next argument is the name of an output file
  208. to use instead of the standard output.
  209. This file may be the same as one of the inputs.
  210. .TP
  211. .BI -T dir
  212. Put temporary files in
  213. .I dir
  214. rather than in
  215. .BR /tmp .
  216. .ne 4
  217. .SH EXAMPLES
  218. .TP
  219. .L sort -u +0f +0 list
  220. Print in alphabetical order all the unique spellings
  221. in a list of words
  222. where capitalized words differ from uncapitalized.
  223. .TP
  224. .L sort -t: +1 /adm/users
  225. Print the users file
  226. sorted by user name
  227. (the second colon-separated field).
  228. .TP
  229. .L sort -umM dates
  230. Print the first instance of each month in an already sorted file.
  231. Options
  232. .B -um
  233. with just one input file make the choice of a
  234. unique representative from a set of equal lines predictable.
  235. .TP
  236. .L
  237. grep -n '^' input | sort -t: +1f +0n | sed 's/[0-9]*://'
  238. A stable sort: input lines that compare equal will
  239. come out in their original order.
  240. .SH FILES
  241. .BI /tmp/sort. <pid>.<ordinal>
  242. .SH SOURCE
  243. .B /sys/src/cmd/sort.c
  244. .SH SEE ALSO
  245. .IR uniq (1),
  246. .IR look (1)
  247. .SH DIAGNOSTICS
  248. .I Sort
  249. comments and exits with non-null status for various trouble
  250. conditions and for disorder discovered under option
  251. .BR -c .
  252. .SH BUGS
  253. An external null character can be confused
  254. with an internally generated end-of-field character.
  255. The result can make a sub-field not sort
  256. less than a longer field.
  257. .PP
  258. Some of the options, e.g.
  259. .B -i
  260. and
  261. .BR -M ,
  262. are hopelessly provincial.