tcs 2.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167
  1. .TH TCS 1
  2. .SH NAME
  3. tcs \- translate character sets
  4. .SH SYNOPSIS
  5. .B tcs
  6. [
  7. .B -slcv
  8. ]
  9. [
  10. .B -f
  11. .I ics
  12. ]
  13. [
  14. .B -t
  15. .I ocs
  16. ]
  17. [
  18. .I file ...
  19. ]
  20. .SH DESCRIPTION
  21. .I Tcs
  22. interprets the named
  23. .I file(s)
  24. (standard input default) as a stream of characters from the
  25. .I ics
  26. character set or format, converts them to runes,
  27. and then converts them into a stream of characters from the
  28. .I ocs
  29. character set or format on the standard output.
  30. The default value for
  31. .I ics
  32. and
  33. .I ocs
  34. is
  35. .BR utf ,
  36. the
  37. .SM UTF
  38. encoding described in
  39. .IR utf (6).
  40. The
  41. .B -l
  42. option lists the character sets known to
  43. .IR tcs .
  44. Processing continues in the face of conversion errors (the
  45. .B -s
  46. option prevents reporting of these errors).
  47. The
  48. .B -c
  49. option forces the output to contain only correctly converted characters;
  50. otherwise,
  51. .B 0x80
  52. characters will be substituted for
  53. .SM UTF
  54. encoding errors and
  55. .B 0xFFFD
  56. characters will substituted for unknown characters.
  57. .PP
  58. The
  59. .B -v
  60. option generates various diagnostic and summary information on standard error,
  61. or makes the
  62. .B -l
  63. output more verbose.
  64. .PP
  65. .I Tcs
  66. recognizes an ever changing list of character sets.
  67. In particular, it supports a variety of Russian and Japanese encodings.
  68. Some of the supported encodings are
  69. .TF jis-kanji
  70. .TP
  71. .B utf
  72. The Plan 9
  73. .SM UTF
  74. encoding, known by ISO as UTF-8
  75. .TP
  76. .B utf1
  77. The deprecated original
  78. .SM UTF
  79. encoding from ISO 10646
  80. .TP
  81. .B ascii
  82. 7-bit ASCII
  83. .TP
  84. .B 8859-1
  85. Latin-1 (Central European)
  86. .TP
  87. .B 8859-2
  88. Latin-2 (Czech .. Slovak)
  89. .TP
  90. .B 8859-3
  91. Latin-3 (Dutch .. Turkish)
  92. .TP
  93. .B 8859-4
  94. Latin-4 (Scandinavian)
  95. .TP
  96. .B 8859-5
  97. Part 5 (Cyrillic)
  98. .TP
  99. .B 8859-6
  100. Part 6 (Arabic)
  101. .TP
  102. .B 8859-7
  103. Part 7 (Greek)
  104. .TP
  105. .B 8859-8
  106. Part 8 (Hebrew)
  107. .TP
  108. .B 8859-9
  109. Latin-5 (Finnish .. Portuguese)
  110. .TP
  111. .B koi8
  112. KOI-8 (GOST 19769-74)
  113. .TP
  114. .B jis-kanji
  115. ISO 2022-JP
  116. .TP
  117. .B ujis
  118. EUC-JX: JIS 0208
  119. .TP
  120. .B ms-kanji
  121. Microsoft, or Shift-JIS
  122. .TP
  123. .B jis
  124. (from only) guesses between ISO 2022-JP, EUC or Shift-Jis
  125. .TP
  126. .B gb
  127. Chinese national standard (GB2312-80)
  128. .TP
  129. .B big5
  130. Big 5 (HKU version)
  131. .TP
  132. .B unicode
  133. Unicode Standard 1.0
  134. .TP
  135. .B tis
  136. Thai character set plus
  137. .SM ASCII
  138. (TIS 620-1986)
  139. .TP
  140. .B msdos
  141. IBM PC: CP 437
  142. .TP
  143. .B atari
  144. Atari-ST character set
  145. .SH EXAMPLES
  146. .TP
  147. .B tcs -f 8859-1
  148. Convert 8859-1 (Latin-1) characters into
  149. .SM UTF
  150. format.
  151. .TP
  152. .B tcs -s -f jis
  153. Convert characters encoded in one of several shift JIS encodings into
  154. .SM UTF
  155. format.
  156. Unknown Kanji will be converted into
  157. .B 0xFFFD
  158. characters.
  159. .TP
  160. .B tcs -lv
  161. Print an up to date list of the supported character sets.
  162. .SH SOURCE
  163. .B /sys/src/cmd/tcs
  164. .SH SEE ALSO
  165. .IR ascii (1),
  166. .IR rune (2),
  167. .IR utf (6).