^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) Unicode support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) Last update: 2005-01-17, version 1.4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) This file is maintained by H. Peter Anvin <unicode@lanana.org> as part
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) of the Linux Assigned Names And Numbers Authority (LANANA) project.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) The current version can be found at:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) http://www.lanana.org/docs/unicode/admin-guide/unicode.rst
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) Introduction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) The Linux kernel code has been rewritten to use Unicode to map
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) characters to fonts. By downloading a single Unicode-to-font table,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) both the eight-bit character sets and UTF-8 mode are changed to use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) the font as indicated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) This changes the semantics of the eight-bit character tables subtly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) The four character tables are now:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) =============== =============================== ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) Map symbol Map name Escape code (G0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) =============== =============================== ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) LAT1_MAP Latin-1 (ISO 8859-1) ESC ( B
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) GRAF_MAP DEC VT100 pseudographics ESC ( 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) IBMPC_MAP IBM code page 437 ESC ( U
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) USER_MAP User defined ESC ( K
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) =============== =============================== ================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) In particular, ESC ( U is no longer "straight to font", since the font
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) might be completely different than the IBM character set. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) permits for example the use of block graphics even with a Latin-1 font
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) loaded.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) Note that although these codes are similar to ISO 2022, neither the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) codes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) G1), whereas ISO 2022 has four 7-bit codes (G0-G3).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) In accordance with the Unicode standard/ISO 10646 the range U+F000 to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) U+F8FF has been reserved for OS-wide allocation (the Unicode Standard
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) refers to this as a "Corporate Zone", since this is inaccurate for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) Linux we call it the "Linux Zone"). U+F000 was picked as the starting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) point since it lets the direct-mapping area start on a large power of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) two (in case 1024- or 2048-character fonts ever become necessary).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) This leaves U+E000 to U+EFFF as End User Zone.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) [v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) hard-coded to map directly to the loaded font, bypassing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) translation table. The user-defined map now defaults to U+F000 to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) U+F0FF, emulating the previous behaviour. In practice, this range
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) might be shorter; for example, vgacon can only handle 256-character
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) (U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) Actual characters assigned in the Linux Zone
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) --------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) In addition, the following characters not present in Unicode 1.1.4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) have been defined; these are used by the DEC VT graphics map. [v1.2]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) ====== ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) ====== ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) The DEC VT220 uses a 6x10 character matrix, and these characters form
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) a smooth progression in the DEC VT graphics character set. I have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) omitted the scan 5 line, since it is also used as a block-graphics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) [v1.3]: These characters have been officially added to Unicode 3.2.0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) they are added at U+23BA, U+23BB, U+23BC, U+23BD. Linux now uses the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) new values.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) [v1.2]: The following characters have been added to represent common
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) keyboard symbols that are unlikely to ever be added to Unicode proper
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) since they are horribly vendor-specific. This, of course, is an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) excellent example of horrible design.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) ====== ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) U+F810 KEYBOARD SYMBOL FLYING FLAG
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) U+F811 KEYBOARD SYMBOL PULLDOWN MENU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) U+F812 KEYBOARD SYMBOL OPEN APPLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) U+F813 KEYBOARD SYMBOL SOLID APPLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) ====== ======================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) Klingon language support
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) In 1996, Linux was the first operating system in the world to add
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) support for the artificial language Klingon, created by Marc Okrand
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) for the "Star Trek" television series. This encoding was later
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) adopted by the ConScript Unicode Registry and proposed (but ultimately
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) rejected) for inclusion in Unicode Plane 1. Thus, it remains as a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) Linux/CSUR private assignment in the Linux Zone.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) This encoding has been endorsed by the Klingon Language Institute.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) For more information, contact them at:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) http://www.kli.org/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) Since the characters in the beginning of the Linux CZ have been more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) of the dingbats/symbols/forms type and this is a language, I have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) located it at the end, on a 16-cell boundary in keeping with standard
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) Unicode practice.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) .. note::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) This range is now officially managed by the ConScript Unicode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) Registry. The normative reference is at:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) https://www.evertype.com/standards/csur/klingon.html
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) Klingon has an alphabet of 26 characters, a positional numeric writing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) system with 10 digits, and is written left-to-right, top-to-bottom.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) Several glyph forms for the Klingon alphabet have been proposed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) However, since the set of symbols appear to be consistent throughout,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) with only the actual shapes being different, in keeping with standard
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) Unicode practice these differences are considered font variants.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) ====== =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) U+F8D0 KLINGON LETTER A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) U+F8D1 KLINGON LETTER B
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) U+F8D2 KLINGON LETTER CH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) U+F8D3 KLINGON LETTER D
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) U+F8D4 KLINGON LETTER E
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) U+F8D5 KLINGON LETTER GH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) U+F8D6 KLINGON LETTER H
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) U+F8D7 KLINGON LETTER I
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) U+F8D8 KLINGON LETTER J
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) U+F8D9 KLINGON LETTER L
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) U+F8DA KLINGON LETTER M
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) U+F8DB KLINGON LETTER N
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) U+F8DC KLINGON LETTER NG
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) U+F8DD KLINGON LETTER O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) U+F8DE KLINGON LETTER P
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) U+F8DF KLINGON LETTER Q
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) - Written <q> in standard Okrand Latin transliteration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) U+F8E0 KLINGON LETTER QH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) - Written <Q> in standard Okrand Latin transliteration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) U+F8E1 KLINGON LETTER R
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) U+F8E2 KLINGON LETTER S
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) U+F8E3 KLINGON LETTER T
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) U+F8E4 KLINGON LETTER TLH
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) U+F8E5 KLINGON LETTER U
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) U+F8E6 KLINGON LETTER V
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) U+F8E7 KLINGON LETTER W
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) U+F8E8 KLINGON LETTER Y
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) U+F8E9 KLINGON LETTER GLOTTAL STOP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) U+F8F0 KLINGON DIGIT ZERO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) U+F8F1 KLINGON DIGIT ONE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) U+F8F2 KLINGON DIGIT TWO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) U+F8F3 KLINGON DIGIT THREE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) U+F8F4 KLINGON DIGIT FOUR
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) U+F8F5 KLINGON DIGIT FIVE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) U+F8F6 KLINGON DIGIT SIX
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) U+F8F7 KLINGON DIGIT SEVEN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) U+F8F8 KLINGON DIGIT EIGHT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) U+F8F9 KLINGON DIGIT NINE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) U+F8FD KLINGON COMMA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) U+F8FE KLINGON FULL STOP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) U+F8FF KLINGON SYMBOL FOR EMPIRE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) ====== =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) Other Fictional and Artificial Scripts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) --------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) Since the assignment of the Klingon Linux Unicode block, a registry of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) fictional and artificial scripts has been established by John Cowan
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) <jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) The ConScript Unicode Registry is accessible at:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) https://www.evertype.com/standards/csur/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) The ranges used fall at the low end of the End User Zone and can hence
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) not be normatively assigned, but it is recommended that people who
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) wish to encode fictional scripts use these codes, in the interest of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) interoperability. For Klingon, CSUR has adopted the Linux encoding.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) The CSUR people are driving adding Tengwar and Cirth into Unicode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) Plane 1; the addition of Klingon to Unicode Plane 1 has been rejected
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) and so the above encoding remains official.