123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172 |
- .TH TCS 1
- .SH NAME
- tcs \- translate character sets
- .SH SYNOPSIS
- .B tcs
- [
- .B -slcv
- ]
- [
- .B -f
- .I ics
- ]
- [
- .B -t
- .I ocs
- ]
- [
- .I file ...
- ]
- .SH DESCRIPTION
- .I Tcs
- interprets the named
- .I file(s)
- (standard input default) as a stream of characters from the
- .I ics
- character set or format, converts them to runes,
- and then converts them into a stream of characters from the
- .I ocs
- character set or format on the standard output.
- The default value for
- .I ics
- and
- .I ocs
- is
- .BR utf ,
- the
- .SM UTF
- encoding described in
- .IR utf (6).
- The
- .B -l
- option lists the character sets known to
- .IR tcs .
- Processing continues in the face of conversion errors (the
- .B -s
- option prevents reporting of these errors).
- The
- .B -c
- option forces the output to contain only correctly converted characters;
- otherwise,
- .B Runeerror
- (0xFFFD)
- characters will be substituted for
- .SM UTF
- encoding errors and unknown characters.
- .PP
- The
- .B -v
- option generates various diagnostic and summary information on standard error,
- or makes the
- .B -l
- output more verbose.
- .PP
- .I Tcs
- recognizes an ever changing list of character sets.
- In particular, it supports a variety of Russian and Japanese encodings.
- Some of the supported encodings are
- .TF jis-kanji
- .TP
- .B utf
- The Plan 9
- .SM UTF
- encoding, known by ISO as UTF-8
- .TP
- .B utf1
- The deprecated original
- .SM UTF
- encoding from ISO 10646
- .TP
- .B ascii
- 7-bit ASCII
- .TP
- .B 8859-1
- Latin-1 (Central European)
- .TP
- .B 8859-2
- Latin-2 (Czech .. Slovak)
- .TP
- .B 8859-3
- Latin-3 (Dutch .. Turkish)
- .TP
- .B 8859-4
- Latin-4 (Scandinavian)
- .TP
- .B 8859-5
- Part 5 (Cyrillic)
- .TP
- .B 8859-6
- Part 6 (Arabic)
- .TP
- .B 8859-7
- Part 7 (Greek)
- .TP
- .B 8859-8
- Part 8 (Hebrew)
- .TP
- .B 8859-9
- Latin-5 (Finnish .. Portuguese)
- .TP
- .B html
- Unicode as encoded by HTML
- .TP
- .B koi8
- KOI-8 (GOST 19769-74)
- .TP
- .B jis-kanji
- ISO 2022-JP
- .TP
- .B ujis
- EUC-JX: JIS 0208
- .TP
- .B ms-kanji
- Microsoft, or Shift-JIS
- .TP
- .B jis
- (from only) guesses between ISO 2022-JP, EUC or Shift-Jis
- .TP
- .B gb
- Chinese national standard (GB2312-80)
- .TP
- .B big5
- Big 5 (HKU version)
- .TP
- .B unicode
- Unicode Standard 1.0
- .TP
- .B tis
- Thai character set plus
- .SM ASCII
- (TIS 620-1986)
- .TP
- .B msdos
- IBM PC: CP 437
- .TP
- .B atari
- Atari-ST character set
- .SH EXAMPLES
- .TP
- .B tcs -f 8859-1
- Convert 8859-1 (Latin-1) characters into
- .SM UTF
- format.
- .TP
- .B tcs -s -f jis
- Convert characters encoded in one of several shift JIS encodings into
- .SM UTF
- format.
- Unknown Kanji will be converted into
- .B 0xFFFD
- characters.
- .TP
- .B tcs -t html
- Convert UTF into character set-independent HTML.
- .TP
- .B tcs -lv
- Print an up to date list of the supported character sets.
- .SH SOURCE
- .B /sys/src/cmd/tcs
- .SH SEE ALSO
- .IR ascii (1),
- .IR rune (2),
- .IR utf (6).
|