Unicode | ||
Unicode is a multi-language character set designed to encompass virtually all of the characters used with computers today. Unicode characters are represented by a 16-bit value, and differ from other character sets in two important ways. First, unlike the traditional single-byte (ANSI) character sets, Unicode is capable of representing significantly more characters in a variety of languages. Second, unlike multi-byte character sets (where some characters may be one byte in length, while others may be two bytes), the characters are fixed-width, which makes them easier to work with. Whenever a string is assigned to a property value or passed to a method, that string is in Unicode. If necessary, the control will automatically convert that string to ANSI and it does not require any additional programming on the part of the developer. This is all largely transparent when using the components in high-level languages like Visual Basic. However, in Visual C++ and other languages that deal with COM objects on a lower level, it is important to understand that string values must be passed as BSTRs, which are Unicode strings. The issue that most commonly confronts developers with regards to how strings are handled by the SocketTools components are with regards to the Read and Write methods. These methods are used to send and receive data over the network, and accept several different types of data. Typically, the data is exchanged as either a string of text characters, or as an array of bytes. Consider the following code: Dim strMessage As String Dim strBuffer As String Dim cbBuffer As Long Do cbBuffer = SocketWrench1.Read(strBuffer, 1024) If cbBuffer > 0 Then strMessage = strMessage + strBuffer Loop Until cbBuffer < 1 In this case, the program expects to receive data from the server which is textual, and it will be stored in the string strMessage. What happens internally is that the data received from the server is automatically converted from an array of bytes into a string by the control. This is done because the control knows that the strBuffer argument is typed as a String, which means it is Unicode. However, what if the data being returned by the server contains binary data or is already Unicode text? In this case, the data may end up being corrupted because of the conversion performed by the control. To prevent this, the solution is to read the data into an array of bytes rather than a string. For example: Dim byteMessage() As Byte Dim byteBuffer(1024) As Byte Dim cbMessage As Long Dim cbBuffer As Long Do cbBuffer = SocketWrench1.Read(byteBuffer, 1024) If cbBuffer > 0 Then ReDim Preserve byteMessage(cbMessage + cbBuffer) As Byte For nIndex = 0 To cbBuffer - 1 byteMessage(cbMessage + nIndex) = byteBuffer(nIndex) Next cbMessage = cbMessage + cbBuffer End If Loop Until cbBuffer < 1 In this case, because the data is being read into a byte array, not a string, then no Unicode conversion is performed and the data is returned exactly as it was sent. Note that Visual Basic also supports the ability to explicitly convert between Unicode strings and byte arrays using the StrConv function. For more information, refer to the language reference and online help in Visual Basic. |
||
Copyright © 2024 Catalyst Development Corporation. All rights reserved. |