How do you type in Japanese?

Tuesday, April 20th, 2010

The Japanese language is infamous for using thousands of Chinese pictographs, or kanji, combined with two separate phonetic systems, or kana — along with plenty of Latin characters, or romaji. So, how do you type in Japanese?

Because of the large number of characters in languages such as Japanese and Chinese (Japanese education standards define around 2000 characters for high school students and JIS standards define more than 6000 characters), characters cannot be entered directly, but instead need to be entered via some kind of conversion process. Modern computers and electronic equipment therefore employ a system whereby words are entered phonetically, and then converted into kanji through an interactive conversion process. I’m going to describe this process by using the Japanese IME included in Microsoft Windows XP.

Romaji-kana conversion is the process of converting Roman alphabetic key presses into Japanese phonetic characters. This is a relatively straightforward process in which the IME looks up Romaji character sequences in a simple table, and replaces sequences with the corresponding kana as soon as they are matched.

Now that the target word is spelled out phonetically in the composition string, the next step is to enter the kana-kanji stage of the process. This is achieved by simply hitting the space bar. What actually goes on inside the IME at this point is actually quite sophisticated and largely beyond the scope of this article. However, the basics are that the IME analyzes the grammar of our text, attempts to identify the separate words in the text (a process known as segmentation that is necessary because there are no spaces in Japanese), and then perform a context-sensitive look up of each of those words in its built-in dictionaries. The IME then picks the best matches for each segment, and displays them like so:

In this case, we’ve used such a common phrase that the IME has no trouble identifying the segmentation and the best candidates for each word. At this point, the IME has still not accepted our input, but is waiting for our approval of the suggested candidate characters. At this point we can either hit enter to accept, escape to return to the kana composition string, look through the other candidates that the IME has dug out of its dictionary and choose alternatives if necessary, or even adjust the location of the break between the two words.

Leave a Reply