If you’ve ever run into problems faced with multi-lingual text differences, Jason’s latest article will solve just that. Learn how to make use of Unicode character-sets when developing in a J2ME-based environment, where you’ll more than likely have a need to develop MIDlets in multiple languages.
What and Why Unicode
Unicode is a unique representation of a character; these characters range from Latin, Hebrew, and Japanese unique symbols. This character set is a universal standard called the ISO/IEC 10646, and can be compared to using ASCII. Like ASCII, a number represents each character; however, ASCII is limited to 7 bits, or in other words 128 character limitations. This has been able to cover everything in English-like languages and a few other symbols but, what about other languages and other symbols?
As the computer industry grew, especially in other countries, where the English language is not the dominant character set, new encoding schemes where invented. This can potentially cause problems when older systems, on different encoding schemes, interface with newer systems, or when systems from one country interface with a system from another country. The systems either become corrupt or simply crash due to the inability to communicate with each other. That is why Unicode was invented. The following statement is taken from http://www.unicode.org, and best summarizes what Unicode is and why we can benefit from it:
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends.
As stated in the summary, Unicode is supported by a wide variety of languages and platforms; this includes Java and, more specifically, J2ME. Fortunately when Java was created they included Unicode support. To use this handy feature simply use u escape sequences.
There is one more required step needed to properly display Unicode characters, that is to ensure the display itself can support Unicode; in this case the emulator screen. You will also need to obtain the appropriate fonts. For example, if you want to display Japanese characters you will need the MS Mincho font. After installing the font on your system you will now need to edit the properties file, which we will go over in next section. This can be very tedious if you decided that you wanted to support dozens of languages. You would have to install and edit dozens of property files. Of course, an easy solution to this is to obtain a font set that contains multiple language support; one of which is the Arial Unicode MS font. Unfortunately Microsoft no longer supplies this font free-of-charge. You will either have to purchase it or find an alternative Unicode font set.
Now with the new fonts installed go to home directory of the Sun Wireless ToolKit, and go into the directory wtkdevices. Now copy DefaultColorPhone and paste in the same directory renamed to UnicodePhone. Also, rename the properties file under the UnicodePhone directory to UnicodePhone.properties. Next, using any editor open the UnicdoePhone.properties file and find the font section. It should look similar to the following:
font.default=SansSerif-plain-10
font.softButton=SansSerif-plain-11
font.system.plain.small: SansSerif-plain-9
font.system.plain.medium: SansSerif-plain-11
font.system.plain.large: SansSerif-plain-14
font.system.bold.small: SansSerif-bold-9
font.system.bold.medium: SansSerif-bold-11
font.system.bold.large: SansSerif-bold-14
font.system.italic.small: SansSerif-italic-9
font.system.italic.medium: SansSerif-italic-11
font.system.italic.large: SansSerif-italic-14
font.system.bold.italic.small: SansSerif-bolditalic-9
font.system.bold.italic.medium: SansSerif-bolditalic-11
font.system.bold.italic.large: SansSerif-bolditalic-14
font.monospace.plain.small: Monospaced-plain-9
font.monospace.plain.medium: Monospaced-plain-11
font.monospace.plain.large: Monospaced-plain-14
font.monospace.bold.small: Monospaced-bold-9
font.monospace.bold.medium: Monospaced-bold-11
font.monospace.bold.large: Monospaced-bold-14
font.monospace.italic.small: Monospaced-italic-9
font.monospace.italic.medium: Monospaced-italic-11
font.monospace.italic.large: Monospaced-italic-14
.
.
Continued
Replace SansSerif and Monospaced with the following: Arial Unicode MS (assuming you are using the Arial Unicode MS font set). The file should now look similar to the following:
font.default=Arial Unicode MS-plain-10
font.softButton=Arial Unicode MS-plain-11
font.system.plain.small: Arial Unicode MS-plain-9
font.system.plain.medium: Arial Unicode MS-plain-11
font.system.plain.large: Arial Unicode MS-plain-14
font.system.bold.small: Arial Unicode MS-bold-9
font.system.bold.medium: Arial Unicode MS-bold-11
font.system.bold.large: Arial Unicode MS-bold-14
font.system.italic.small: Arial Unicode MS-italic-9
font.system.italic.medium: Arial Unicode MS-italic-11
font.system.italic.large: Arial Unicode MS-italic-14
font.system.bold.italic.small: Arial Unicode MS-bolditalic-9
font.system.bold.italic.medium: Arial Unicode MS-bolditalic-11
font.system.bold.italic.large: Arial Unicode MS-bolditalic-14
font.monospace.plain.small: Arial Unicode MS-plain-9
font.monospace.plain.medium: Arial Unicode MS-plain-11
font.monospace.plain.large: Arial Unicode MS-plain-14
font.monospace.bold.small: Arial Unicode MS-bold-9
font.monospace.bold.medium: Arial Unicode MS-bold-11
font.monospace.bold.large: Arial Unicode MS-bold-14
font.monospace.italic.small: Arial Unicode MS-italic-9
font.monospace.italic.medium: Arial Unicode MS-italic-11
font.monospace.italic.large: Arial Unicode MS-italic-14
font.monospace.bold.italic.small: Arial Unicode MS-bolditalic-9
font.monospace.bold.italic.medium: Arial Unicode MS-bolditalic-11
font.monospace.bold.italic.large: Arial Unicode MS-bolditalic-14
.
.
Continued
For more detailed installation instructions refer to Qusay H. Mahmoud’s excellent tutorial, which is indicated in the reference section of this article.
Now that everything is “Unicode ready”, let’s test everything out using the famous Hello World example:
Source Code:
import javax.microedition.lcdui.*;
import javax.microedition.midlet.*;
public class SimpleUnicodeTest extends MIDlet {
Display display;
Form form = null;
StringItem msg = null;
public SimpleUnicodeTest() {
}
public void startApp() {
display = Display.getDisplay(this);
msg = new StringItem("'Hello World' in Japanese","u3053u3093u306Bu3061u306Fu4E16u754C");
form = new Form("Unicode Test");
form.append(msg);
display.setCurrent(form);
}
public void pauseApp() {}
public void destroyApp(boolean unconditional) {}
}
The output of our Hello World MIDlet should look like the following:
See “Reference Section” for the online web tools I used to convert Hello World to and from to u3053u3093u306Bu3061u306Fu4E16u754C.
The next step in module code separation would be to load language definitions from an external source. One option would be to read in language definitions from a text file, more appropriately a Unicode file. Here is a method you can use to read a Unicode text file:
To create a test file you can use the free application, Simredo, from http://www4.vc-net.ne.jp/~klivo/sim/simeng.htm
public String readUnicodeFile(String filename) {
    StringBuffer buffer = null;
    InputStream is = null;
    InputStreamReader isr = null;
    try {
      Class c = this.getClass();
      is = c.getResourceAsStream(filename);
      if (is == null)
        throw new Exception("File Does Not Exist");
      isr = new InputStreamReader(is,"UTF8");
      buffer = new StringBuffer();
      int ch;
      while ((ch = isr.read()) > -1) {
        buffer.append((char)ch);
      }
      if (isr != null)
        isr.close();
    } catch (Exception ex) {
System.out.println(ex);
    }
    return buffer.toString();
}
In the last section we went over reading a Unicode file. An alternative approach is to read the text file with Unicode codes in them. Referring back to the Hello World example the following would be inside the text file:
u3053u3093u306Bu3061u306Fu4E16u754C
One would think this is very straight forward and that a simple load/read of the text would do. However, when reading the file each character is treated as a string. So in other words:
u3053
reads in as
‘, u’, 3′,’0′,’5′,’3′
You will now have to detect and parse out the u which indicates that the next four characters represent a Unicode character. The following method will help convert the string to the appropriate Unicode character. It assumes you are only passing in a valid four character Unicode (ie: 3053).
private String convertStrToUnicode(String value) {
   short valueAsShort = Short.parseShort(value.trim(),16);
   return String.valueOf((char)valueAsShort);
}
Hopefully this article has given you more insight into using Unicode with your J2ME game or application. It is not uncommon for carriers and mobile distributors, in other parts of the world to request native language support for their country.
References and Credits
A big “Thank You” to Eric Giguere, Shiuh-Lin Lee, and BigS for providing me with feedback and help.
Developing Multilingual Wireless Java Applications by Qusay H. Mahmoud, http://wireless.java.sun.com/midp/ttips/customize/
Unicode Information – http://www.unicode.org
Hello World Conversion to Japanese, . I used http://www.worldlingo.com
From to Unicode codes I used http://code.cside.com/3rdpage/us/javaUnicode/converter.html
Unicode Text Editor – http://www4.vc-net.ne.jp/~klivo/sim/simeng.htm
Supported Encoding for Java – http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html
*First published at DevArticles.com
Jason is a wireless and open source developer enthusiast who enjoys creating synergy and sharing knowledge in the software development world. To learn more about him visit his personal site at http://www.jasonlam604.com/