Traditional Chinese 倉頡輸入法 Hanyu Pinyin Wade–Giles Ts'ang-chieh Shu-ju-fa | Simplified Chinese 仓颉输入法 Gwoyeu Romatzyh Tsang Jye Shuruhfaa IPA [tsʰáŋtɕjě ʂúɻûfà] | |
![]() | ||
The Cangjie input method (Tsang-chieh input method, sometimes also Changjie, Cang Jie or ChongKit) is a system by which Chinese characters may be entered into a computer using a standard keyboard. Invented in 1976 by Chu Bong-Foo, the method is named after Cangjie (Tsang-chieh), the mythological inventor of the Chinese writing system; the name was suggested by Chiang Wei-kuo, then Defence Minister of Taiwan. Although the input method was initially based upon traditional Chinese characters, it has since been revamped so that Cangjie and the simplified Chinese character set can interact.
Contents
- The keys and radicals
- The basic rules
- Examples
- The short list of exceptions
- Early Cangjie system
- Issues
- Perceived difficulties
- Actual difficulties
- Versions of Cangjie
- Variants of Cangjie
- Applications
- References
In filenames and elsewhere, the name Cangjie is sometimes abbreviated as cj.
Unlike pinyin, Cangjie is based on the graphological aspect of the characters: each basic, graphical unit is represented by a basic character component, 24 in all, each mapped to a particular letter key on a standard QWERTY keyboard. An additional, "difficult character" function is mapped to the X key. Within the keystroke-to-character representations, there are four subsections of characters: the Philosophical Set (corresponding to the letters 'A' to 'G' and representing the sun, the moon and the five elements), the Strokes Set (corresponding to the letters 'H' to 'N' and representing the brief and subtle strokes), the Body-Related Set (corresponding to the letters 'O' to 'R' and representing various parts of human anatomy), and the Shapes Set (corresponding to the letters 'S' to 'Y' and representing complex and encompassing character forms).
The basic character components in Cangjie are usually called "radicals"; nevertheless, Cangjie decomposition is not based on traditional Kangxi radicals, nor is it based on standard stroke order; it is in fact a simple geometric decomposition.
[Translation]
This is no problem; there are also auxiliary forms to complement the deficiencies of the radicals. The auxiliary forms are variations of the shape of the radicals, [and therefore] easy to remember.
[Translation]
The dictionary appended [to this book] is based on the 4800 standard, commonly used characters as proclaimed by the Ministry of Education. Adding to this the characters that are automatically generated, the number of characters is about 15,000 (using the Kangxi dictionary as a basis).
The keys and "radicals"
The basic character components in Cangjie are called "radicals" (字根) or "letters" (字母). There are 24 radicals but 26 keys; the 24 radicals (the basic shapes 基本字形) are associated with roughly 76 auxiliary shapes (輔助字形), which in many cases are either rotated or transposed versions of components of the basic shapes. For instance, the letter A (日) can represent either itself, the slightly wider 曰, or a 90° rotation of itself. (For a more complete account of the 76-odd transpositions and rotations than the one listed below, see the article on Cangjie entry in Chinese Wikibooks.)
The auxiliary shapes of each Cangjie radical have changed slightly between different versions of the Cangjie method; this is one reason that different versions of the Cangjie method are not completely compatible.
The basic rules
The typist must be familiar with several decomposition rules 拆字規則 that define how to analyse a character to arrive at a Cangjie code.
The rules are subject to various principles:
Examples
The short list of exceptions
Some forms are always decomposed in the same way, whether the rules say they should be decomposed this way or not. The number of such exceptions is small:
Some forms cannot be decomposed. They are represented by an X. (Which appears as the 難 key on a Cangjie keyboard.)
Early Cangjie system
In the beginning, the Cangjie input method was not a way to produce a character in any character set. It was, instead, an integrated system consisting of the Cangjie input rules and a Cangjie controller board. The controller board contains character generator firmware, which dynamically generates Chinese characters from Cangjie codes when characters are output, using the hi-res graphics mode of an Apple II computer. In the preface of the Cangjie user's manual, Chu Bong-Foo wrote in 1982:
[in translation]In terms of output: The output and input, in fact, [form] an integrated whole; there is no reason that [they should be] dogmatically separated into two different facilities.… This is in fact necessary.…
In this early system, when the user types "yk " (for example) to get the Chinese character 文, the Cangjie codes do not get converted to any character encoding; the actual string "yk " is stored. In a very real sense, the Cangjie code of each character (string of 1 to 5 lowercase letters plus a space) was the encoding of that particular character.
A particular "feature" of this early system is that if you send random lowercase words to the character generator, it will attempt to construct Chinese characters according to the Cangjie decomposition rules, sometimes causing strange, unknown characters to appear. This unusual feature, "automatic generation of characters", is actually described in the manual and is responsible for producing more than 10,000 of the about 15,000 characters that the system can handle. The name Cangjie, evocative of creation of new characters, was actually very apt for this early version of Cangjie.
The presence of the integrated character generator also explains the historical necessity for the existence of the "X" key used for disambiguation of decomposition collisions: because characters are "chosen" when the codes are output, every character that can be displayed must in fact have a unique Cangjie decomposition. It would not make sense—nor would it be practical—for the system to provide a choice of candidate characters when some random text file is displayed; the user would not know which of the candidates are correct.
Issues
Cangjie was designed to be an easy-to-use system to help promote the use of Chinese computing; nevertheless, many users find Cangjie to be a difficult method. Many of the perceived difficulties arise from poor instruction.
Perceived difficulties
Enough practice, however, can overcome the above problems. A typist with sufficient practice in Cangjie touch-types, much like one typing English. It is entirely possible for a touch-typist to type at 25 Chinese characters per minute (cpm) or better in Cangjie, yet have difficulty remembering the list of auxiliary shapes or even the decomposition rules. Experienced Cangjie typists can reportedly attain a typing speed between 60 cpm and over 200 cpm.
Actual difficulties
Cangjie, however, also has some fundamental problems:
In some situations it cannot be used at all. Cangjie uses all 26 keys in an QWERTY keyboard; it cannot be used to input Chinese on feature phones. For cell phones, zhuyin, 5-stroke (or 9-stroke by Motorola) and the Q9 input method are the current norm because they are designed specifically for use on numeric keypads. Of course, smartphones can and do support Cangjie input by using the touchscreen virtual keyboard.
Versions of Cangjie
The Cangjie input method is commonly said to have gone through five generations (commonly referred to as “versions” in English), each of which is slightly incompatible with the others. Currently, version 3 (第三代倉頡) is the most common; it is the version of Cangjie supported natively by Microsoft Windows. Version 5 (第五代倉頡), supported by the Free Cangjie IME and previously the only Cangjie supported by SCIM, represents a significant minority method and supported by iOS.
The early Cangjie system supported by the Zero One card on the Apple II was Version 2; Version 1 was never released.
The Cangjie input method supported on the classic Mac OS is somewhat like Version 3 and somewhat like Version 5.
Version 5, like the original Cangjie input method, was created directly by Chu, the inventor. Chu had hoped that the release of Version 5, originally slated to be Version 6, would bring an end to the “more than ten versions of Cangjie input method” (slightly incompatible versions created by different vendors).
Version 6 was developed by Chu's longtime assistant Shen Honglian (沈紅蓮). It was created as the encoding for a character set of about 100,000 characters extracted from Chinese literature. This character set was developed independently from Unicode, which Chu heavily criticized as inferior in design. Version 6 has not yet been released to the public, but is being used to create a database which can accurately store every historical Chinese text.
Variants of Cangjie
Most modern implementations of Cangjie IMEs provide various convenient features:
Besides the wildcard key, many of these features are very convenient for casual users but unsuitable for touch-typists because they make the Cangjie IME unpredictable.
There have also been various attempts to "simplify" Cangjie one way or another:
Applications
Many researchers have discussed ways to decompose Chinese characters into their major components, and have tried to build applications based on the decomposition system. The idea can be referred to as the study of The Genes of Chinese Characters. Cangjie codes certainly offer a basis for such an endeavour. Academia Sinica in Taiwan and Jiaotong University in Shanghai have similar projects as well.
One direct application of the use of decomposed characters is the possibility of computing the similarities in writing Chinese characters, e.g. the Cangjie input method offers a good starting point for this kind of application. By relaxing the limit of five codes for each Chinese character and adopting more detailed Cangjie codes for each character, we can compute visually similar characters. Integrating this with pronunciation information enables computer-assisted learning of Chinese characters.