UTF-8

mw · April 12, 2007, 03:34:13 PM

Hi

I wrote a little procedure:

static void test(HWND hWnd)
{
WCHAR wc[2];
char c[6], s[20];
int n;

wc[0] = L'§';
wc[1] = L'\0';

n = wcstombs(c, wc, 5);

sprintf(s, "%d %d %d", n, strlen(c), c[0]);
MessageBox(hWnd, s, "Test", MB_OK);
}

The message box displays these values: 1 1 -89

Why is the string length 1? The UTF-8 code of the section sign § is 0xc2 0xa7.
So wcstombs() should convert the wide char § into the 2-byte-sequence 0xc2 0xa7.
But wcstombs() converts § to the single byte 0xa7.

Did I make a mistake? Or do I misunderstand the technique of character conversion?
Or does the behaviour of wcstombs() depend on localization settings?
What is the right way to convert UTF-8 to wide chars (or vice versa)?

Best regards
Martin

Pelle · April 12, 2007, 09:28:59 PM

The exact wc <-> mb conversion isn't specified by the C standard, so you will (most likely) get different results with different implementations, and locale settings. Pelles C currently implements the "C" locale only, and you get a basic 8-bit ASCII conversion.

I don't need/use locale settings myself, no requests for it, and it would seriously bloat part of the C runtime, so I have (so far) settled for "C" locale only...

JohnF · April 13, 2007, 06:37:46 AM

Martin, you could try the Windows API WideCharToMultiByte

I don't know if it will work but it's worth a try. You can set CP_UTF8 as the codepage.

EDIT:

Code Select


WCHAR wc[2];
char c[6] = {0}, s[20];
int n;

wc[0] = L'§';
wc[1] = L'\0';

n = WideCharToMultiByte(CP_UTF8, 0, // performance and mapping flags
	wc,    // wide-character string
        1,     // number of chars in string
  	c,     // buffer for new string
  	6,     // size of buffer
  	NULL,  // default for unmappable chars
  	NULL); // set when default char used

sprintf(s, "%d %d %hhx %hhx", n, strlen(c), c[0], c[1]);
MessageBox(0, s, "Test", MB_OK);

c[0] and c[1] are displayed as c2 and a7

John

mw · April 13, 2007, 09:32:16 AM

Thank you for the answers.

After I submitted my question to this forum I searched in the internet and found some information about WideCharToMultiByte() and MultiByteToWideChar(). I think that these functions are right for my purposes. I tested them on Windows Mobile 2003 and they worked well.

Does anybody know if WideCharToMultiByte() and MultiByteToWideChar() are available on older versions of Windows Mobile, too? (I ask this question because I made some bad experiences with MoveToEx() and LineTo(): These both functions worked on Windows Mobile 2003 and later versions but not on PDAs with older versions of this operating system; since I know that I use PolyLine() instead.)

Martin

Stefan Pendl · April 13, 2007, 10:17:42 AM

These functions are available since Windows CE 1.01, but an OEM can remove this support.

See http://msdn2.microsoft.com/en-us/library/ms961248.aspx
and http://msdn2.microsoft.com/en-us/library/ms886760.aspx

News:

UTF-8

mw

Pelle

JohnF

mw

Stefan Pendl