Java Unicode
What Is Unicode?
Unicode is a universal character encoding standard that assigns a unique code point to every character, symbol, or script used across global languages. It replaces older encoding systems like ASCII, ISO 8859-1, KOI-8, and GB18030, which were limited to specific regions or languages.
Why Java Uses Unicode
Java was designed to be platform-independent and multilingual. To support international applications, Java adopted Unicode from the beginning. This allows developers to write programs that handle characters from any language — Hindi, Chinese, Arabic, Russian, and more.
Each Unicode character is represented by 2 bytes
Unicode range in Java: \u0000 to \uFFFF
Character Encoding Before Unicode
Encoding Standard | Region |
ASCII | United States |
ISO 8859-1 | Western Europe |
KOI-8 | Russia |
GB18030 / BIG-5 | China |
These systems were inconsistent and incompatible across platforms. Unicode solved this by creating a unified encoding system.
Working with Unicode in Java
Java supports two approaches to work with Unicode characters:
Using Unicode Escape Sequences
Storing Unicode Characters Directly
1. Unicode Escape Sequences
Escape sequences are used when characters cannot be typed directly. They start with \u followed by four hexadecimal digits.
Example: Storing 'A' Using Escape Sequence
public class UnicodeDemo {
public static void main(String[] args) {
char unicodeChar = '\u0041'; // Unicode for 'A'
System.out.println("Stored Unicode Character: " + unicodeChar);
}
}Stored Unicode Character: A
2. Direct Character Storage
If the character can be typed or displayed, you can store it directly using single quotes.
Example: Storing 'A' Directly
public class UnicodeDemo {
public static void main(String[] args) {
char unicodeChar = 'A';
System.out.println("Stored Unicode Character: " + unicodeChar);
}
}Stored Unicode Character: A
Mixed Example: Escape + Direct Storage
public class UnicodeDemo {
public static void main(String[] args) {
char letterA = '\u0041'; // Escape sequence
char letterSigma = '\u03A3'; // Greek capital Sigma
char copyright = '\u00A9'; // © symbol
char letterZ = 'Z'; // Direct character
char dollar = '$'; // Direct character
System.out.println("Escape Sequence Characters:");
System.out.println("A: " + letterA);
System.out.println("Sigma: " + letterSigma);
System.out.println("Copyright: " + copyright);
System.out.println("\nDirect Characters:");
System.out.println("Z: " + letterZ);
System.out.println("Dollar: " + dollar);
}
}Manipulating Unicode Values
You can perform arithmetic on Unicode values since char is a numeric type internally.
Example: Calculating Characters
public class UnicodeManipulation {
public static void main(String[] args) {
char upperA = '\u0041'; // 'A'
char lowerA = '\u0061'; // 'a'
char upperB = 'B';
int diff = upperA - lowerA; // -32
char upperC = (char) (upperB + diff);
char lowerC = (char) (upperC + 32);
System.out.println("Difference between A and a: " + diff);
System.out.println("Calculated uppercase C: " + upperC);
System.out.println("Calculated lowercase c: " + lowerC);
}
}