sort 4.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260
  1. .TH SORT 1
  2. .SH NAME
  3. sort \- sort and/or merge files
  4. .SH SYNOPSIS
  5. .B sort
  6. [
  7. .BI -cmuMbdf\&inrwt x
  8. ]
  9. [
  10. .BI + pos1
  11. [
  12. .BI - pos2
  13. ] ...
  14. ] ...
  15. [
  16. .B -k
  17. .I pos1
  18. [
  19. .I ,pos2
  20. ]
  21. ] ...
  22. [
  23. .B -o
  24. .I output
  25. ]
  26. [
  27. .B -T
  28. .I dir
  29. \&...
  30. ]
  31. [
  32. .I option
  33. \&...
  34. ]
  35. [
  36. .I file
  37. \&...
  38. ]
  39. .SH DESCRIPTION
  40. .I Sort\^
  41. sorts
  42. lines of all the
  43. .I files
  44. together and writes the result on
  45. the standard output.
  46. If no input files are named, the standard input is sorted.
  47. .PP
  48. The default sort key is an entire line.
  49. Default ordering is
  50. lexicographic by runes.
  51. The ordering is affected globally by the following options,
  52. one or more of which may appear.
  53. .TP
  54. .B -M
  55. Compare as months.
  56. The first three
  57. non-white space characters
  58. of the field
  59. are folded
  60. to upper case
  61. and compared
  62. so that
  63. .L JAN
  64. precedes
  65. .LR FEB ,
  66. etc.
  67. Invalid fields
  68. compare low to
  69. .LR JAN .
  70. .TP
  71. .B -b
  72. Ignore leading white space (spaces and tabs) in field comparisons.
  73. .TP
  74. .B -d
  75. `Phone directory' order:
  76. only letters,
  77. accented letters,
  78. digits and white space
  79. are significant in comparisons.
  80. .TP
  81. .B -f
  82. Fold lower case
  83. letters onto upper case.
  84. Accented characters are folded to their
  85. non-accented upper case form.
  86. .TP
  87. .B -i
  88. Ignore characters outside the
  89. .SM ASCII
  90. range 040-0176
  91. in non-numeric comparisons.
  92. .TP
  93. .B -w
  94. Like
  95. .BR -i ,
  96. but ignore only tabs and spaces.
  97. .TP
  98. .B -n
  99. An initial numeric string,
  100. consisting of optional white space,
  101. optional plus or minus sign,
  102. and zero or more digits with optional decimal point,
  103. is sorted by arithmetic value.
  104. .TP
  105. .B -g
  106. Numbers, like
  107. .B -n
  108. but with optional
  109. .BR e -style
  110. exponents, are sorted by value.
  111. .TP
  112. .B -r
  113. Reverse the sense of comparisons.
  114. .TP
  115. .BI -t x\^
  116. `Tab character' separating fields is
  117. .IR x .
  118. .PP
  119. The notation
  120. .BI + "pos1\| " - pos2\^
  121. restricts a sort key to a field beginning at
  122. .I pos1\^
  123. and ending just before
  124. .IR pos2 .
  125. .I Pos1\^
  126. and
  127. .I pos2\^
  128. each have the form
  129. .IB m . n\f1,
  130. optionally followed by one or more of the flags
  131. .BR Mbdfginr ,
  132. where
  133. .I m\^
  134. tells a number of fields to skip from the beginning of the line and
  135. .I n\^
  136. tells a number of characters to skip further.
  137. If any flags are present they override all the global
  138. ordering options for this key.
  139. A missing
  140. .BI \&. n\^
  141. means
  142. .BR \&.0 ;
  143. a missing
  144. .BI - pos2\^
  145. means the end of the line.
  146. Under the
  147. .BI -t x\^
  148. option, fields are strings separated by
  149. .IR x ;
  150. otherwise fields are
  151. non-empty strings separated by white space.
  152. White space before a field
  153. is part of the field, except under option
  154. .BR -b .
  155. A
  156. .B b
  157. flag may be attached independently to
  158. .IR pos1
  159. and
  160. .IR pos2.
  161. .PP
  162. The notation
  163. .B -k
  164. .IR pos1 [, pos2 ]
  165. is how POSIX
  166. .I sort
  167. defines fields:
  168. .I pos1
  169. and
  170. .I pos2
  171. have the same format but different meanings.
  172. The value of
  173. .I m\^
  174. is origin 1 instead of origin 0
  175. and a missing
  176. .BI \&. n\^
  177. in
  178. .I pos2
  179. is the end of the field.
  180. .PP
  181. When there are multiple sort keys, later keys
  182. are compared only after all earlier keys
  183. compare equal.
  184. Lines that otherwise compare equal are ordered
  185. with all bytes significant.
  186. .PP
  187. These option arguments are also understood:
  188. .TP \w'\fL-z\fIrecsize\fLXX'u
  189. .B -c
  190. Check that the single input file is sorted according to the ordering rules;
  191. give no output unless the file is out of sort.
  192. .TP
  193. .B -m
  194. Merge; assume the input files are already sorted.
  195. .TP
  196. .B -u
  197. Suppress all but one in each
  198. set of equal lines.
  199. Ignored bytes
  200. and bytes outside keys
  201. do not participate in
  202. this comparison.
  203. .TP
  204. .B -o
  205. The next argument is the name of an output file
  206. to use instead of the standard output.
  207. This file may be the same as one of the inputs.
  208. .TP
  209. .BI -T dir
  210. Put temporary files in
  211. .I dir
  212. rather than in
  213. .BR /tmp .
  214. .ne 4
  215. .SH EXAMPLES
  216. .TP
  217. .L sort -u +0f +0 list
  218. Print in alphabetical order all the unique spellings
  219. in a list of words
  220. where capitalized words differ from uncapitalized.
  221. .TP
  222. .L sort -t: +1 /adm/users
  223. Print the users file
  224. sorted by user name
  225. (the second colon-separated field).
  226. .TP
  227. .L sort -umM dates
  228. Print the first instance of each month in an already sorted file.
  229. Options
  230. .B -um
  231. with just one input file make the choice of a
  232. unique representative from a set of equal lines predictable.
  233. .TP
  234. .L
  235. grep -n '^' input | sort -t: +1f +0n | sed 's/[0-9]*://'
  236. A stable sort: input lines that compare equal will
  237. come out in their original order.
  238. .SH FILES
  239. .BI /tmp/sort. <pid>.<ordinal>
  240. .SH SOURCE
  241. .B /sys/src/cmd/sort.c
  242. .SH SEE ALSO
  243. .IR uniq (1),
  244. .IR look (1)
  245. .SH DIAGNOSTICS
  246. .I Sort
  247. comments and exits with non-null status for various trouble
  248. conditions and for disorder discovered under option
  249. .BR -c .
  250. .SH BUGS
  251. An external null character can be confused
  252. with an internally generated end-of-field character.
  253. The result can make a sub-field not sort
  254. less than a longer field.
  255. .PP
  256. Some of the options, e.g.
  257. .B -i
  258. and
  259. .BR -M ,
  260. are hopelessly provincial.