JavaUnicode

Java Unicode

What Is Unicode?

Unicode is a universal character encoding standard that assigns a unique code point to every character, symbol, or script used across global languages. It replaces older encoding systems like ASCII, ISO 8859-1, KOI-8, and GB18030, which were limited to specific regions or languages.

Why Java Uses Unicode

Java was designed to be platform-independent and multilingual. To support international applications, Java adopted Unicode from the beginning. This allows developers to write programs that handle characters from any language — Hindi, Chinese, Arabic, Russian, and more.

  • Each Unicode character is represented by 2 bytes

  • Unicode range in Java: \u0000 to \uFFFF

Character Encoding Before Unicode

Encoding Standard

Region

ASCII

United States

ISO 8859-1

Western Europe

KOI-8

Russia

GB18030 / BIG-5

China

These systems were inconsistent and incompatible across platforms. Unicode solved this by creating a unified encoding system.

Working with Unicode in Java

Java supports two approaches to work with Unicode characters:

  1. Using Unicode Escape Sequences

  2. Storing Unicode Characters Directly

1. Unicode Escape Sequences

Escape sequences are used when characters cannot be typed directly. They start with \u followed by four hexadecimal digits.

Example: Storing 'A' Using Escape Sequence

Java
public class UnicodeDemo {
    public static void main(String[] args) {
        char unicodeChar = '\u0041'; // Unicode for 'A'
        System.out.println("Stored Unicode Character: " + unicodeChar);
    }
}
Stored Unicode Character: A
2. Direct Character Storage

If the character can be typed or displayed, you can store it directly using single quotes.

Example: Storing 'A' Directly

Java
public class UnicodeDemo {
    public static void main(String[] args) {
        char unicodeChar = 'A';
        System.out.println("Stored Unicode Character: " + unicodeChar);
    }
}
Stored Unicode Character: A
Mixed Example: Escape + Direct Storage

Java
public class UnicodeDemo {
    public static void main(String[] args) {
        char letterA = '\u0041';       // Escape sequence
        char letterSigma = '\u03A3';   // Greek capital Sigma
        char copyright = '\u00A9';     // © symbol

        char letterZ = 'Z';            // Direct character
        char dollar = '$';             // Direct character

        System.out.println("Escape Sequence Characters:");
        System.out.println("A: " + letterA);
        System.out.println("Sigma: " + letterSigma);
        System.out.println("Copyright: " + copyright);

        System.out.println("\nDirect Characters:");
        System.out.println("Z: " + letterZ);
        System.out.println("Dollar: " + dollar);
    }
}
Manipulating Unicode Values

You can perform arithmetic on Unicode values since char is a numeric type internally.

Example: Calculating Characters

Java
public class UnicodeManipulation {
    public static void main(String[] args) {
        char upperA = '\u0041'; // 'A'
        char lowerA = '\u0061'; // 'a'
        char upperB = 'B';

        int diff = upperA - lowerA; // -32
        char upperC = (char) (upperB + diff);
        char lowerC = (char) (upperC + 32);

        System.out.println("Difference between A and a: " + diff);
        System.out.println("Calculated uppercase C: " + upperC);
        System.out.println("Calculated lowercase c: " + lowerC);
    }
}
Note
Unicode arithmetic can be tricky — always verify results when manipulating characters.
Best Practices
Tip
Use escape sequences for characters that are not easily typed. Prefer direct storage for readable characters. Avoid arithmetic on Unicode unless necessary. Use Unicode to support multilingual applications.