Glossary > Unicode

Unicode

Top Previous Next

Glossary > Unicode

Unicode is a coding standard that was developed by the Unicode Consortium. Unicode can represent almost all written languages of the world.

However, all the individual characters cannot be represented by a single byte any more as it is the case for the ASCII/ANSI set. There are different methods how the characters can be represented by means of several bytes.

•	For the representation of every single character two (or four) bytes are used. This method is used in the Windows operating system.

•

Different characters are coded depending on her general meaning with a different number of bytes. A very common standard, which uses this method, is UTF-8. In the UTF-8 coded Unicode, the first 128 characters of the ASCII code only use one byte. ASCII code and Unicode are identical here. The 128 characters following in the ANSI code are represented in UTF-8 by two bytes each and all further characters need still more bytes for their representation.

Example:

If an UTF-8 coded file is opened in ANSI mode the German word "für" appears as:

fÃ¼r

The German Umlaut 'ü' needs two bytes in the UTF-8 encoding, which are shown as two characters in ANSI mode. But if the file is opened in UTF-8 mode the word is shown correctly.

This page belongs to the TextTransformer Documentation

Home Content German