Смекни!
smekni.com

Writing System Essay Research Paper Complextext LanguagesIn (стр. 2 из 3)

However this guesswork is not acceptable for specific applications, such as poetry or processing of a classical text, which requires the use of diacritics. In some complex-text languages, such as Thai, the use of vowel symbols and tone marks is mandatory.

In Arabic, spacing diacritics are currently used as a compromise. In the present Arabic systems, some or all of the Arabic diacritics are implemented as separate characters to be rendered following the character to which the diacritics belong.

National Numbers

In both Latin-based languages and Hebrew, numbers are represented using the so-called Arabic digits (1, 2, 3, 4, 5, 6, 7, 8, 9 and 0). However, the cursive languages, (Arabic, Farsi, and Urdu), as well as many other complex-text languages, have their own national glyphs for digits.5 The local name for numbers used in the cursive languages is not “Arabic numbers”, but Hindi or sometimes Arabic-Indic numbers. The direction of the numbers is always left-to-right. Mathematical formulae in Arabic are written from right to left and in Farsi they are written from left to right.

It is important to understand that in most cases, the text stored for processing has numbers encoded in their Arabic (western) code. When it comes to presentation, these numbers might be presented using either national glyphs for digits or ordinary Arabic digits, according to the intent of the user or application developer.

Bi-directional Data Entry

To those unfamiliar with bi-directional languages, understanding how segments of text with different directionality can actually be entered from a keyboard is somewhat of a puzzle.

Keying Order

The order in which bi-directional data is typed into a workstation is the order in which the text is meant to be read – the logical order.

Bi-directional Keyboards

The keyboards used for bi-directional languages are similar to those used for English, but on the same key tops on which Latin characters and symbols are engraved, character symbols specific to the other language are added. In the case of the cursive languages, such as Arabic, the character symbols engraved are the basic characters only. Special key combinations are used to switch between the English keyboard layer and the national-language keyboard layer. For example, in some cases, the Hebrew layer is made active, on a Hebrew keyboard, by simultaneously pressing the *Alt* and *Right Shift* keys. Such key combinations are also used to enter appropriate input modes. For example, in some environments, simultaneously pressing *Shift* and *Num Lock* enter push mode. Push mode is a keyboard input mode in which characters are pushed in the direction opposite to the base direction of the segment and the cursor does not move, in the same way the digits behave on the screen of a pocket calculator.

Bi-directional Typing Interfaces

To allow for bi-directional text entry from a keyboard, the interfaces must be able to intercept and process each keystroke. These interfaces can be part of the terminal and associated controller’s hardware or micro code, or they can be a specific routine that is added to the operating system. There are two typing interfaces to consider:

Manual typing method

Automatic (logical) typing method.

Manual Typing Method

In the manual typing method the user informs the system in which direction the characters are to be typed. For mixed-direction typing, the user makes extensive use of the Push and End Push keyboard functions.

The manual method also supports an Automatic Push (Auto Push) mode. When the Auto Push setting is active, the Push Mode is started and terminated automatically, according to the actual characters being typed.

When the manual typing method is active, the keyboard language group and cursor direction are handled separately by the system. This means that the user has separate control for:

The direction of the field – controlled by the Field Reverse keyboard function.

The direction of the typed text – controlled by the Push and End Push keyboard functions.

The keyboard language group – controlled by the keyboard language group switching keys.

Automatic (Logical) Typing Method

This convention provides some automatic handling of directionality. When this method is active, the system determines the directionality of each part of the text (each segment) based on the actual characters being typed, using a set of predefined rules.

The method is called logical because the direction of the text is logically deduced based on the language of the characters.

With this method, the system automatically determines how to display characters in the correct order when the user switches keyboard language groups.

Another feature of this method is that it handles text in typing order; that is, the system remembers the order in which the characters were initially typed. It then uses this knowledge along with a set of predefined rules, to determine how the text is displayed, processed and deleted by the application.

If the cursor is in the Home position (the first logical position in the field or window) and a character of a language other than the default language of the current orientation is entered, the screen or window orientation is reversed automatically. That is, if the character entered is Hebrew, the window orientation is right-to-left; if the character is English, the window orientation is left-to-right.

Common User Access and Bi-directional Languages

The basic rule for applications that are to conform to Common User Access guidelines is that “… All pieces of data must be displayed in the orientation that is correct for the application user. Data input must be supported in the orientation that is natural for users”.

Thai Language and its Writing System

The Thai language is representative of a class of complex-text languages whose characters are composed of a number of symbols or elements. Thai belongs to the Sino-Tibetan family of languages. Like the Chinese languages, which also belong to this family, Thai is a monosyllabic tone language. While it resembles Chinese structurally and though much of its basic vocabulary is of Chinese origin, it has also been greatly influenced by both Pali and Sanskrit.

The Thai writing system was developed from the Devanagari system, which originated in India and came to Thailand from Cambodia. A major difference between the Chinese and Thai writing systems is that while Chinese makes use of a large number of pictorial symbols, Thai uses an alphabet of consonants, vowels, tone marks, diacritics and special symbols. With some exceptions, a Thai word can be pronounced correctly on sight, in a similar manner to Italian or French.

Writing Thai Characters – Graphic Representation

Thai is written from left to right, without spaces between words. Each word is represented by one or more syllables; each syllable consists of a consonant, a vowel, a tone and a final consonant or a final diacritic. Spaces in the text indicate the ends of phrases or sentences, and are thus used as a form of punctuation. Thus, individual words are recognized only by scanning the text for syllable boundaries. Compared to western writing systems, the composed characters tend to be taller and thinner.

A line of Thai text can be considered to be logically divided into four parallel lines:

The base line, on which consonants, some vowels, some Thai symbols and Thai numbers are written

The line below the base line, used for writing lower vowels and lower diacritics

The line above the base line, used for writing upper vowels and upper diacritics

The line above the upper vowel line, used for writing tone marks and upper diacritics. (If there is no upper vowel, the tone mark or the upper diacritic is written on the upper vowel line.)

Thai Written Symbols

Generally speaking, the more than 2,000 characters in the Thai writing system can be categorised into 20 types of written symbols, with 88 basic symbols:

10 base line numerics

44 base line consonants

3 base line ancient signs

2 base line special symbols

1 base line currency sign

1 base line Thai word break character

5 base line leading vowels (vowel in front of consonant)

3 base line type 1 following vowels

1 base line type 2 following vowels

2 base line type 3 following vowels

1 upper vowel line type 1 upper vowel

2 upper vowel line type 2 upper vowel

2 upper vowel line type 3 upper vowel

1 upper vowel line ancient sign (or upper vowel line type 3 upper diacritic)

4 tone mark line tone marks

2 tone mark line type 1 upper diacritic symbol

1-tone mark line type 2 upper diacritic symbols

1 lower vowel line type 1 lower vowel

1 lower vowel line type 2 lower vowel

1 lower vowel line lower diacritic symbol

Normally, Thai data is encoded using a single-byte code page, where each symbol has an adequate code point. The symbols are used to enter Thai data on a Thai keyboard. Thus the Thai data is stored, for processing purposes, as symbol elements.

These elements have to be combined into characters for rendering purposes.

Writing Order

In the most common writing order, first a base line symbol is written, and then optionally, an upper vowel or lower vowel symbol is written above or below it. A tone mark symbol may then optionally be written either above the base line symbol, or above the upper vowel symbol, if present.

This order of writing is taught in Thai elementary schools. However, writing-order inconsistencies exist between individuals. The valid combinations of symbols for Thai composed characters are:

Base line consonant symbol

Base line consonant symbol and tone mark symbol

Base line consonant symbol and upper diacritic symbol

Base line consonant symbol and upper vowel symbol

Base line consonant symbol, upper vowel symbol and tone mark symbol

Base line consonant symbol, upper vowel symbol and upper diacritic symbol

Base line consonant symbol and lower vowel symbol

Base line consonant symbol and lower diacritic symbol

Base line consonant symbol, lower vowel symbol and tone mark symbol.

Any other combinations would be considered invalid.

What is a Thai Character?

From a linguistic or phonetic point of view, the Thai writing system is actually more complex than that described above.

Consonants are written on the base line. A middle vowel can be written either before, after or straddling the related consonant.

Upper-vowels are written above, and lower vowels below, their related consonant. Vowels are always pronounced and collated after the consonant. The tone mark is usually written after the upper vowel or lower vowel, but some people might write it after the consonant. The left and right pieces of a middle vowel, which straddle a consonant, are included as separate components in some encoding schemes.

To prevent confusion, the term-composed character is used here for the representation of one syllable at a writing position, and the term symbol is used for the components of a composed character.

Thai Numbers

Although Western numerals (Arabic numbers) are now widely used in Thai writing, there are also ten Thai glyphs for numbers.

In Thai, the equivalent of the Arabic digits 1, 2, 3, 4, 5, 6, 7, 8, 9 and 0 are respectively:

Figure: Shape of the National Numbers in Thai

Character Composition

According to the rules for writing Thai, only certain combinations of symbols are possible. When someone fluent in Thai is writing or reading a line, a process of composition is taking place. In about 74 percent of cases a character is formed from a single symbol; in about 22 percent of cases, it is formed from two symbols; and in 4 percent of cases it is formed from three symbols.

A Thai speaker does not think of a composed character as, for example, an accented character in French. This difference in thinking is reflected in the difference between European and Thai keyboards. In European keyboards, dead keys are used to place accents on characters. The dead key is pressed first to show the accent, and then the character key is pressed. The cursor moves only after the character has been entered. All character manipulation is done at the cursor position.

In Thai the consonant or middle vowel is entered first. It is displayed, and the cursor then moves one position to the right. The upper and lower (dead key) vowels and tone marks are then added to the character to the left of the cursor. The rightmost column of positions on the screen is used to display the cursor only, and data is not allowed in this column. Usually vowels and tone marks are stacked on the consonants to compose syllables. The exception is middle vowels, which stay independently at the same level as the consonants.

Thai Character Rendering

Quality font rendering (for example, for desktop publishing), requires additional changes to be made to a Thai composite character form, and sometimes to other characters in its vicinity.

Examples

Some of the base line symbols that have a descender in the lower position change shape in the presence of a lower vowel.

Some other base line symbols with a descender do not change their shape. Instead, when these symbols are combined with a lower vowel, the vertical or horizontal position of that lower vowel is changed. Similarly, when some base line symbols with an ascender are combined with an upper vowel, a tone mark or both, the location of the upper vowel, tone mark or both is shifted horizontally.

The vertical position of a tone mark is dependent upon the presence or absence of an upper vowel. If an upper vowel is not present, the tone mark is positioned at the level that an upper vowel would occupy.

A specific base line vowel partially overlaps with the associated previous consonant. If the associated consonant does not have an ascender, the vowel is moved up and to the left, to hang over the right side of the previous base line consonant. If the associated consonant has an ascender, the vowel is split into two pieces, with one piece positioned to the left of the ascender and another to the right.

It is thus possible to recognize a similarity between character composition in Thai, and ligatures composition and shaping in bi-directional languages. The character presented is not identical with the symbols stored, so a shaping or composing algorithm must be applied.

Similarly, there are cases where the shaping transformation must not to be performed at rendering, but at a previous stage. When using the high-quality printers adapted for double-byte character set (DBCS), a shaping of characters (maximum three-symbol), is performed as part of the transformation of text to a double-byte encoding scheme. In this case, the text can be considered stored in a shaped form for higher-efficiency printing. This resembles the case in which Arabic message text is kept in storage in a shaped layout.

Conclusions and Guidelines

Though so different in their appearances, all complex-text languages – the bi-directional ones such as Arabic, Farsi, Urdu, Hebrew and Yiddish, or the languages such as Thai, Lao, or Korean – have a distinct common characteristic: the form of the rendered text is different from that of the stored text. The transformation functions needed to perform the changes between rendered and stored text depend on descriptive information pertaining to the attributes of complex-text languages: global orientation, text-type, symmetrical swapping, shaping and national numbers.

Application developers should be aware of the fact that in the complex-text languages there is a need for transformations between the different text layouts. They should allow for user or system exits to facilitate invoking these transformations, in those places where a transformation might be expected (at input, before output, before a collating process, and so on). Programs must be able to identify the location and content of the complex-text attributes, and be able to change their content if needed.

Just as for any other language, an application meant to be used for complex-text languages should utilize the appropriate language code page and cultural data (date and time layout, collating sequence, monetary layout, and so on).

Application developers should design their products in such a way that they use, as much as possible, the standard functions and controls provided by the operating system services or toolkits for these languages. They might choose to use the APIs offered in the national language versions of the operating system services or toolkits to perform such transformations (when available).

It would be good practice to concentrate all the functions related to National Language in a specific program area for easy maintainability and change support.

Footnotes

1.Arabic is spoken mainly in Algeria, Bahrain, Egypt, Iraq, Israel, Jordan, Kuwait, Lebanon, Libya, Morocco, Oman, Qatar, Saudi Arabia, Sudan, Syria, Tunisia, United Arab Emirates, and Yemen. Urdu is spoken mainly in Pakistan. Farsi is spoken mainly in Iran. Hebrew is spoken mainly in Israel, and Yiddish is spoken mainly in Israel, Europe, and North America.

2.Sometimes it is possible to have a contextual global orientation, where the global orientation is set according to the directional characteristic of the first character in the data stream that has a distinct directionality.