Properties of the Chinese Orthography

The orthographic structure of Chinese can be considered from three levels: characters, components, and strokes. Characters can be subdivided into simple and compound characters.

Compound characters are composed of two or more components, which in turn are composed of strokes, and strokes are the smallest orthographic units. Simple characters consist of one or more strokes.

Character

Written Chinese consists of strings of two-dimensional squared symbols called characters, which in turn are formed by subunits (i.e., strokes, components) that are constrained in a squared region of identical size. Characters are separated from one another by a space. Indeed, characters function as the basic perceptual units in Chinese reading just as the role played by words in English reading. A large proportion of Chinese characters are compound characters.

Component

The term “component” could mean bushou (部首) or bujian (部件) in Chinese. Bushou is a relatively restricted definition referring to a specific set of stroke patterns that are used to look up characters in the dictionary, whereas bujian is a relatively broad definition referring to all the stroke patterns that function as the constituent components of characters.

Components could be further divided into semantic and phonetic components. Semantic components signify the semantic category of the characters in which they occur, whereas phonetic components denote the pronunciation of the whole characters. However, neither the semantic nor the phonetic cuing function of components is reliable in all cases. Around 80% of Chinese characters are phonetic compound characters, each of which is usually composed of a semantic component and a phonetic component. Previous studies have successfully demonstrated that both semantic and phonetic components could facilitate character processing.

  Another characteristic of components is that they could be arranged in various ways to form characters. Unlike in most alphabetic languages where the constituent letters of each word are arranged linearly from left to right, the components of Chinese characters can be arranged horizontally (e.g, 即([jí], at soon)) or vertically (e.g., 告([gào], tell)). In fact, Chinese characters can take many other forms, such as L-shaped (e.g, 边([biān], side)), P-shaped (e.g, 庆([qìng],  congratulate)), and enclosed (e.g, 国([guó], country)) structures. A direct consequence of this is that the orthographic similarity between two Chinese characters cannot be measured just by counting the number of shared constituent units, as is the case in English. Instead, other factors, such as the structure of characters, should also be considered. Moreover, as all the components of each character is packed in a squared box of constant size which substantially increases visual complexity, elaborated analysis of the spatial information and locations of various components is essential during character reading.

Furthermore, in alphabetic languages like English, a letter may appear in almost any position of a word, but in Chinese most components have their own fixed or typical position in a character. For example, “扌” always appears on the left position, whereas “刂” always occurs on the right position. According to the Chinese Component Position Frequency Dictionary (1984), about 60% of components have this positional property. Another special characteristic of components is their semantic and phonetic cuing functions. Semantic components signify the semantic category of the characters in which they occur, whereas phonetic components denote the pronunciation of the whole characters. Around 80% of Chinese characters are phonetic compound characters. each of which is composed of a semantic component and a phonetic one. More importantly, semantic and phonetic components typically appear on the left and right sides of horizontally-structured characters, respectively. Given that components appear at relatively fixed positions and component position usually relates to the semantic or phonetic information of a character, establishing the exact position of components may be of critical importance for character identification. It is therefore unclear whether component position is coded similarly as letter position at the early stage of orthographic processing.

Stroke

A stroke is the smallest orthographic unit in written Chinese, which corresponds to the feature level in English. Strokes can be grouped into five major categories: dot, horizontal line, vertical line, oblique line, and twisty line. Studies on Chinese usually use the number of strokes to index the orthographic complexity of characters. The effect of the number of strokes on character recognition has been shown by many studies using a variety of paradigms ranging from character decision, naming, character-digit coding, and tachistoscope identification. These results generally demonstrated that with other factors being constant (e.g., number of components, frequency of the whole character), the more strokes a character contains, the longer it takes one to process it, indicating that stroke-level information plays a role in Chinese character recognition.