CodePage Property, File Encoding Control

CodePage Property

Gets and sets the code page used when encoding and decoding text.

Syntax

object.CodePage [= value ]

Remarks

The CodePage property is an integer value which specifies how strings are converted to and from Unicode. Any valid code page identifier may be specified. Some common values are:

Value Description

0 Strings should be converted using the ANSI code page for the current locale. You should not use this code page unless you know the text only includes ASCII characters.

1 Strings should be converted using the system default OEM code page. The OEM code page typically contains characters that are used by console applications and are based on character sets commonly used by MS-DOS. It is not recommended that you use this code page unless you know that the text includes OEM characters.

1252 Strings should be converted using the Windows ANSI code page for western European languages. This code page is commonly used by legacy Windows applications for English and some other western languages. It should be noted that while this code page is similar to ISO 8859-1 character encoding, it is not identical.

28591 Strings should be converted using the ISO 8859-1 code page for western European languages. This code page is commonly referred to as Latin-1 and is similar to the Windows 1252 code page.

65000 Strings should be converted using the UTF-7 code page. If this code page is specified, data will be encoded as UTF-7 encoded Unicode. It is not recommended that you use this code page unless you know the text uses UTF-7 encoding.

65001 Strings should be converted using the UTF-8 code page. If this code page is specified, text will be processed as UTF-8 encoded Unicode. Text that is decoded will be converted from UTF-8 to UTF-16 Unicode. Because UTF-8 is backwards compatible with the ASCII character set, it is safe to use this option when encoding and decoding ASCII text. This is the default code page used when encoding text.

A complete list of available code page identifiers can be found in Microsoft's documentation for the Win32 API. This property value directly corresponds to Windows code page identifiers and will accept any valid code page in addition to the values listed above. Setting this property to an invalid code page will result in an error.

By default, strings are converted to an array of bytes using the UTF-8 code page. When decoding text, the characters are converted to Unicode before they are returned to your application. If the decoded text appears to be corrupted or characters are being replaced with question marks or other symbols, it is likely the encoded string uses a different character set. Most services use UTF-8 encoding because it can represent all Unicode characters.

If you set this property to an incorrect code page it may corrupt the encoded text. For example, if code page 1252 (English) is specified and you set the DecodedText property to a string which contains Greek characters, the EncodedText property will return an invalid value because it includes characters which cannot be represented by that code page. If the text you wish to encode includes non-ASCII text, it is recommended you always use the default UTF-8 code page.

Data Type

Integer (Int32)

Syntax

Remarks

Data Type

See Also