NO

Author Topic: Problem with C11 unicode support  (Read 14155 times)

migf1

  • Guest
Problem with C11 unicode support
« on: June 20, 2012, 09:32:02 PM »

Did someone actually test any of the new C11 features? This was the main point of the original release candidate, you know...

/Pelle

J just started playing around with some of the C11 unicode additions. The following code fails to compile on Win XP Pro SP3, 32-bits, resulting in a fatal error #1065: Failed converting input using codepage 65001 error...

Code: [Select]
#include <stdlib.h>
#include <stdio.h>
//#include <uchar.h>

#define pressENTER()                                \
    do{                                    \
        char mYcHAr;                            \
        printf( u8"πατήστε ENTER..." );                    \
        while ( (mYcHAr=getchar()) != '\n' && mYcHAr != EOF )        \
            ;                            \
    }while(0)


/*****************************************/
int main( void )
{
#if defined(__STDC_UTF_16__)
    puts( "utf16 enabled" );
#endif

#if defined(__STDC_UTF_32__)
    puts( "utf32 enabled" );
#endif

    char u8str[] = u8"αβγδ";   // this is "abcd" in Greek

    printf( "%s\n", u8str );

    pressENTER();
    exit(0);
}


Project options:

CCFLAGS: -std:C11 -Tx86-coff -Ot -Ob1 -fp:precise -W1 -Gd
ASFLAGS: -AIA32 -Gd
LINKFLAGS: -subsystem:console -machine:x86 kernel32.lib advapi32.lib delayimp.lib

The source file encoding doesn't seem to make any difference (I tried saving it in all available encodings).

FYI, it did compile with mingw32-gcc-4.7.0 (which btw seems to provides partial C11 support, missing lots of header files, uchar.h included).  On the utf-8 aware mintty console it produces the following output...





However, on the native cmd.exe switched to cp 65001 (this a handicapped UTF-8 codepage) it doesn't output the greek characters...





Switching cmd.exe to cp 1253 (the "good" Greek ANSI codepage) it outputs the Greek characters, but scrambled as expected.




PS. Tomorrow I will also test it on Win7 Home 64-bit and I'll let you guys know.

 
 
« Last Edit: June 20, 2012, 09:52:01 PM by Stefan Pendl »

migf1

  • Guest
Re: Problem with C11 unicode support
« Reply #1 on: June 21, 2012, 12:36:00 AM »
Thanks for moving it to the correct section (sorry for the inconvenience).

migf1

  • Guest
Re: Problem with C11 unicode support
« Reply #2 on: June 21, 2012, 11:28:26 AM »
It compiles fine on Win7 Home 64-bit. I tried it both as a 32bit and as a 64bit project.

Here is the output on the UTF-8 mintty console...




And here is the same output on the native cmd.exe of Win 7, switched to codepage 65001...




Here is the cmd.exe cp 65001 output from the executable produced by MinGW32 ...




The first observation is that Pelles C doesn't seem to define __STDC_UTF_32__ (it doesn't output the "utf32 enabled" string on the screen). MinGW32 outputs it on both XP 32bit and 7 64bit.

The second observation is that contrary to the XP 32bit, on Win7 64bit MinG32 prints the ?-symbol char for Greek characters (on XP it was not printing anything... but it could be to different implementation of the Lucida Console font on the 2 platforms). For Pelles C I don't have a point of reference, since it doesn't compile the code on XP, but here on 7 it seems to at least print correct Greek chars instead of marking them as uknown (although it seems to output some extra ones too... maybe some of them occupy more than 1 byte, but I haven't checked with the Unicode table)

I don't know if it is important, but the XP used yesterday is an English version with the regional settings set Greek, while Win7 is a Greek version.
« Last Edit: June 21, 2012, 12:31:48 PM by migf1 »

CommonTater

  • Guest
Re: Problem with C11 unicode support
« Reply #3 on: June 21, 2012, 12:53:44 PM »
The problem you're having here may not be Pelles c...  Windows is internally utf16le unicode.  It doesn't know utf8 natively and it doesn't know utf32 at all.  The console has two modes depending upon the first string output to it... either oem or utf16le...  Once you give it a unicode string it won't display anything else.
 
You appear to have discovered the problem of lag time... right now C-11 does this stuff... but windows does not.  Thus it's useful for file storage, networking and communications but has to be converted for the display...

Try your same program in utf16le (WCHAR or wchar_t) screen output and see what happens...
 
Try sending utf8 and utf32 outputs to a disk file and examine them in a hex editor... there you will see what works and what doesn't.


 
« Last Edit: June 21, 2012, 01:00:26 PM by CommonTater »

migf1

  • Guest
Re: Problem with C11 unicode support
« Reply #4 on: June 21, 2012, 01:24:33 PM »
Thanks for the answer tater, but the problem is not the console (I'm aware of its limitations, that's why I have also demonstrated the output on a proper utf-8 enabled console, namely mintty).

The problem is that  Pelles C cannot compile the code on XP SP3 (I only have it in 32bit version).

As for the _ _STDC_UTF_32_ _, the fact that Pelles C does not define it means -according to the ISO C11 standard- that the values of type char32_t (e.g. U"string-literal") are not internally encoded as UTF-32 by the compiler. The latter must have something to do (I guess) that on Windows platforms the wchar_t type occupies 2 bytes (instead of 4 on most other platforms).

I mentioned it because I found it a bit odd that migw32 defines it (so it treats U"blabla" as a string-literal with 4-byte chars encoded in UTF-32, although it is supposed to use MS runtime libs... btw, there's no uchar.h header file in the mingw-gcc 4.7.0, so char32_t is not defined... but it does understand __CHAR16_TYPE__ and __CHAR32_TYPE__ since c99 I think... a complete chaos! ).
« Last Edit: June 21, 2012, 01:31:11 PM by migf1 »

CommonTater

  • Guest
Re: Problem with C11 unicode support
« Reply #5 on: June 21, 2012, 01:35:10 PM »
So do you have access to Windows 7 Professional... even through a friend or computer store?  Win7 Home lacks a lot of the Pro version's "language" support

What you can do is upload the smallest Pelles C project that demonstrates the problem and I'll give it a try for you on my systems... XP x64 and Win7 Pro x64... Just use the Project->ZipFiles option from the menu and upload the zip.  I'm sure Pelle will ask you for the same thing so you might as well get it uploaded...
« Last Edit: June 21, 2012, 01:47:47 PM by CommonTater »

migf1

  • Guest
Re: Problem with C11 unicode support
« Reply #6 on: June 21, 2012, 01:40:07 PM »
So you have access to Windows 7 Professional... even through a friend or computer store?  Win7 Home lacks a lot of the Pro version's "language" support

What you can do is upload the smallest Pelles C project that demonstrates the problem and I'll give it a try for you on my systems... XP x64 and Win7 Pro x64... Just use the Project->ZipFiles option from the menu and upload the zip.  I'm sure Pelle will ask you for the same thing so you might as well get it uploaded...

Thanks, zips attached  (c11 is the 64bit).

PS. They compile and work fine on Win7 Home 64bit... the problem is with XP SP3 32bit (it does not even compile the code... mingw compiles it fine).
« Last Edit: June 21, 2012, 01:43:51 PM by migf1 »

CommonTater

  • Guest
Re: Problem with C11 unicode support
« Reply #7 on: June 21, 2012, 01:59:07 PM »
You're quick... so I'll be quick too   :D
 
I ran into a build error in the x86 version (screen snip below).  When I changed _CHAR32_TYPE to the more standard char32_t it compiled...
 
I also got a "malicious code" warning from someplace in the bowels of Win7 which would not allow me to unpack the EXEs in the project.  This does not surprise me, though since Windows does not support 32 bit characters...
 
The results are in the attachments below...
« Last Edit: June 21, 2012, 02:02:02 PM by CommonTater »

migf1

  • Guest
Re: Problem with C11 unicode support
« Reply #8 on: June 21, 2012, 02:04:07 PM »
Ooops, the line: __CHAR32_TYPE__ c32; was not supposed to be there at all (leftover from my experimentations with migw32).
The produced output looks just fine, I get the same on Win7 Home x64.

Does the code compile on your XP x64?

PS. I'm sorry about the malicious code, I have no idea why you got that warning.


migf1

  • Guest
Re: Problem with C11 unicode support
« Reply #9 on: June 21, 2012, 02:06:11 PM »
I have to go now, please let us know what happens on Win XP x64 when you get some time to play with it, thanks.

CommonTater

  • Guest
Re: Problem with C11 unicode support
« Reply #10 on: June 21, 2012, 02:11:31 PM »
Does the code compile on your XP x64?

Yes... with the same result, but different garbage characters. 

Quote
PS. I'm sorry about the malicious code, I have no idea why you got that warning.
Not to worry... Like I said it's probably because of the 32bit characters...
 

migf1

  • Guest
Re: Problem with C11 unicode support
« Reply #11 on: June 21, 2012, 06:39:07 PM »
Thank you, tater!

Unfortunately, it still does not compile on XP 32bit. I just d/ed the c11x86.zip from the previous post (it was made on Win7 64bit) and tried it on XP 32 bit... same error.

I think Pelle should test it too.

CommonTater

  • Guest
Re: Problem with C11 unicode support
« Reply #12 on: June 21, 2012, 08:56:35 PM »
Thank you, tater!

Unfortunately, it still does not compile on XP 32bit. I just d/ed the c11x86.zip from the previous post (it was made on Win7 64bit) and tried it on XP 32 bit... same error.

I think Pelle should test it too.


Try this...
Code: [Select]
#include <stdlib.h>
#include <stdio.h>
#include <uchar.h>
 
#define pressENTER()       \
    do{         \
        char mYcHAr;       \
        printf( u8"ðáôÞóôå ENTER..." );     \
        while ( (mYcHAr=getchar()) != '\n' && mYcHAr != EOF )  \
            ;        \
    }while(0)

/*****************************************/
int main( void )
{
#if defined(__STDC_UTF_16__)
 puts( "utf16 enabled" );
#endif
#if defined(__STDC_UTF_32__)
 puts( "utf32 enabled" );
#endif
 char u8str[] = u8"áâãä";   // this is "abcd" in Greek
 char32_t  c32;
 printf( "%s\n", u8str );
 pressENTER();
 exit(0);
}

And yes, I agree, Pelle should take a look....
 
 

migf1

  • Guest
Re: Problem with C11 unicode support
« Reply #13 on: June 21, 2012, 09:22:15 PM »
Same error.

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
Re: Problem with C11 unicode support
« Reply #14 on: July 08, 2012, 04:20:25 PM »
I use as much Unicode support from Windows as I can - almost everything. It was UCS 2 in early NT days, and UTF-16 later (I don't remember the Windows version, but something was improved in this area at some point - to or after XP). I currently don't define __STDC_UTF_32__ since I'm not convinced yet the UTF-16 <-> UTF-32 conversion are 100% correct. I think the Unicode support works well enough for English and Swedish, which is really my main priority.
/Pelle