Message Character Sets

Value Name Code Page Description

mimeCharsetUSASCII us-ascii A character set which defines 7-bit printable characters with values ranging from 20h to 7Eh. An application that uses this character set has the broadest compatibility with most mail servers (MTAs) because it does not require the server to handle 8-bit characters correctly when the message is delivered.

mimeCharsetISO8859_1 iso-8859-1 A character set for most western European languages such as English, French, Spanish and German. This character set is also commonly referred to as Latin-1. This character set is similar to Windows code page 1252 (Windows-1252), however there are differences such as the Euro symbol.

mimeCharsetISO8859_2 iso-8859-2 A character set for most central and eastern European languages such as Czech, Hungarian, Polish and Romanian. This character set is also commonly referred to as Latin-2. This character set is similar to Windows code page 1250, however the characters are arranged differently.

mimeCharsetISO8859_3 iso-8859-3 A character set for southern European languages such as Maltese and Esperanto. This character set was also used with the Turkish language, but it was superseded by ISO 8859-9 which is the preferred character set for Turkish. This character set is not widely used in mail messages and it is recommended that you use UTF-8 instead.

mimeCharsetISO8859_4 iso-8859-4 A character set for northern European languages such as Latvian, Lithuanian and Greenlandic. This character set is not widely used in mail messages and it is recommended that you use UTF-8 instead.

mimeCharsetISO8859_5 iso-8859-5 A character set for Cyrillic languages such as Russian, Bulgarian and Serbian. This character set was never widely adopted and most mail messages use either KOI8 or UTF-8 encoding.

mimeCharsetISO8859_6 iso-8859-6 A character set for Arabic languages. Note that the application is responsible for displaying text that uses this character set. In particular, any display engine needs to be able to handle the reverse writing direction and analyze the context of the message to correctly combine the glyphs.

mimeCharsetISO8859_7 iso-8859-7 A character set for the Greek language. This character set is also commonly referred to as Latin/Greek. This character set is no longer widely used and has largely been replaced with UTF-8 which provides more complete coverage of the Greek alphabet.

mimeCharsetISO8859_8 iso-8859-8 A character set for the Hebrew language. Note that similar to Arabic, Hebrew uses a reverse writing direction. An application which displays this character should be capable of processing bi-directional text where a single message may include both right-to-left and left-to-right languages, such as Hebrew and English. In most cases it is recommended that you use UTF-8 instead of this character set.

mimeCharsetISO8859_9 iso-8859-9 A character set for the Turkish language. This character set is also commonly referred to as Latin-5. This character set is nearly identical to ISO 8859-1, except that it replaces certain Icelandic characters with Turkish characters.

mimeCharsetISO8859_10 iso-8859-10 A character set for the Danish, Icelandic, Norwegian and Swedish languages. This character set is also commonly referred to as Latin-6 and is similar to ISO 8859-4.

mimeCharsetISO8859_13 iso-8859-13 A character set for Baltic languages. This character set is also commonly referred to as Latin-7. This character set is similar to ISO 8859-4, except it adds certain Polish characters and does not support Nordic languages.

mimeCharsetISO8859_14 iso-8859-14 A character set for Gaelic languages such as Irish, Manx and Scottish Gaelic. This character set is also commonly referred to as Latin-8. This character set replaced ISO 8859-12 which was never fully implemented.

mimeCharsetISO8859_15 iso-8859-15 A character set for western European languages. This character set is also commonly referred to as Latin-9 and is nearly identical to ISO8859-1 except that it replaces lesser-used symbols with the Euro sign and some letters.

mimeCharsetISO2022_JP iso-2022-jp A multi-byte character encoding for Japanese that is widely used with mail messages. This is a 7-bit encoding where all characters start with ASCII and uses escape sequences to switch to the double-byte character sets.

mimeCharsetISO2022_KR iso-2022-kr A multi-byte character encoding for Korean which encodes both ASCII and Korean double-byte characters. This is a 7-bit encoding which uses the shift in and shift out control characters to switch to the double-byte character set.

mimeCharsetISO2022_CN x-cp50227 A multi-byte character encoding for Simplified Chinese which encodes both ASCII and Chinese double-byte characters. This is a 7-bit encoding which uses the shift in and shift out control characters to switch to the double-byte character set.

mimeCharsetKOI8R koi8-r A character set for Russian using the Cyrillic alphabet. This character set also covers the Bulgarian language. Most mail messages in the Russian language use this character set or UTF-8 instead of ISO 8859-5, which was never widely adopted.

mimeCharsetKOI8U koi8-u A character set for Ukrainian using the Cyrillic alphabet. This character set is similar to the KOI8-R character set, but replaces certain symbols with Ukrainian letters. Most mail messages in the Ukrainian language use this character set or UTF-8 instead of ISO 8859-5, which was never widely adopted.

mimeCharsetGB2312 x-cp20936 A multi-byte character encoding which can represent ASCII and simplified Chinese characters. It has been superseded by GB18030, however it remains widely used in China.

mimeCharsetGB18030 gb18030 A Unicode transformation format which can represent all Unicode code points and supports both simplified and traditional Chinese characters. It is backwards compatible with GB2312 and supersedes that character set.

mimeCharsetBIG5 big5 A multi-byte character set that supports both ASCII characters and traditional Chinese characters. It is widely used in Taiwan, Hong Kong and Macau. It is no longer commonly used in China, which has developed GB18030 as a standard encoding. Microsoft's implementation of Big5 on Windows does not support all of the extensions and is missing certain code points.

mimeCharsetUTF7 utf-7 A Unicode transformation format that uses variable-length character encoding to represent Unicode text as a stream of ASCII characters that are safe to transport between mail servers that only support 7-bit printable characters. It is primarily used as an alternative to UTF-8 when quoted-printable or base64 encoding is not desired.

mimeCharsetUTF8 utf-8 A Unicode transformation format that uses multi-byte character sequences to represent Unicode text. It is backwards compatible with the ASCII character set, however because it uses 8-bit text, it is recommended that you use either quoted-printable or base64 encoding to ensure compatibility with mail servers that do not support 8-bit characters.

mimeCharsetUTF16 utf-16le N/A A 16-bit Unicode format that represents each character as a 16-bit value in little endian byte order. This character set is not widely used in mail messages and it is recommended that you use UTF-8 instead. UTF-16 characters in big endian byte order are not supported.

Remarks

When composing a new message, it is recommended that you always use UTF-8 as the character set encoding which ensures broad compatibility with most applications. The other character sets are primarily used when parsing messages generated by other applications. Internally, all message headers and text are processed as UTF-8 and returned as Unicode strings.

In addition to the character sets listed above, the control will recognize additional character sets which correspond to specific Windows code pages, as well several variants. These additional character sets are included for compatibility with other applications; they are not defined because they should not be used when composing new messages.

It is important to note that while certain Windows character sets are similar to standard ISO character sets, they are not identical. For example, although the Windows-1252 character set is nearly identical to ISO 8859-1, they are not interchangeable. Some legacy applications make the error of representing Windows ANSI character sets as 8-bit ISO character sets, which can result in errors when converting them to Unicode. This is something to be aware of when encoding and decoding text generated by older applications. Before the widespread adoption of UTF-8, it was particularly common for legacy Windows mail clients to default to using Windows-1252 for text and label it as using ISO 8859-1.

Although the control supports UTF-16, it is recommended you use UTF-8 instead. Text which uses UTF-16 will always be base64 encoded, and some mail clients may not recognize it as a valid character set. If the message does not specify if big endian or little endian byte order is used, the library will default to little endian. When UTF-16 is used when composing a new message, it will always use little endian byte order.

If you are using this control with Visual Basic 6.0, be aware that the IDE does not provide complete support for Unicode text. Although the control uses Unicode internally, if a header or message body contains characters which cannot be displayed using the current system ANSI code page, the text can appear to be corrupted when examining the string using the debugger. If a message contains text which uses a character set other than the system default, you must use controls which are Unicode aware to display the text, such as the Microsoft InkEdit control. The standard TextBox and other common controls in Visual Basic do not support Unicode.

Remarks

See Also