Pelles C forum

Pelles C => Bug reports => Topic started by: Vortex on August 30, 2023, 11:46:47 AM

Title: UTF8 issue on Windows 7
Post by: Vortex on August 30, 2023, 11:46:47 AM
Hello,

pocc is displaying an error message if the option /utf-8 is specified in the command line. The problem appears on Windows 7 systems.

Code: [Select]
pocc /utf-8 -std:C11  /Tx86-coff -Ot -Ob1 -fp:precise -W1 -Ze -Zx test.c

E:\PellesC\Include\ctype.h(40): fatal error #1065: Failed converting input from 'U'.

It doesn't matter if the source code is ANSI, UTF8 or UTF8 with BOM.
Title: Re: UTF8 issue on Windows 7
Post by: John Z on August 30, 2023, 12:15:40 PM
Hi Vortex,

I still have a win 7 system running.  Post your test.c file and I'll verify on
my system.  Pelles C version 12 I assume....

John Z
Title: Re: UTF8 issue on Windows 7
Post by: Vortex on August 30, 2023, 12:39:55 PM
Hi John,

Pelles C version : 12.00.2

The code does not matter, you will receive the same error message with the option /utf-8 on Windows 7 :

Code: [Select]
#include <stdio.h>

int main(int argc, char *argv[])
{
  printf("test");
  return 0;
}
Title: Re: UTF8 issue on Windows 7
Post by: John Z on August 30, 2023, 01:48:50 PM
Thanks Vortex!

Confirmed - Win 7 Pro with V12 and Win 7 Home with V12 both show the issue -
Win 7 Home with V9 does not recognize the /UTF-8 option as it was not implemented yet . . . .

John Z

It looks like based on the help file that this option was added in V11. just fyi maybe test V11
Title: Re: UTF8 issue on Windows 7
Post by: Vortex on August 30, 2023, 09:53:27 PM
Hi John,

Thanks for your tests.
Title: Re: UTF8 issue on Windows 7
Post by: MrBcx on August 30, 2023, 11:00:26 PM
Hi John,

Thanks for your tests.

Erol - I'm paying attention  ;)
Title: Re: UTF8 issue on Windows 7
Post by: TimoVJL on August 31, 2023, 10:34:34 AM
Code: [Select]
//#include <stdio.h>
int __cdecl printf(char*,...);
int main(int argc, char *argv[])
{
  printf("test");
  return 0;
}
Code: [Select]
10: pocc.exe -utf-8 test.c
fatal error: Unknown option: /utf-8.

11: pocc.exe -utf-8 test.c
test.c(5): fatal error #1065: Failed converting input using codepage 65001.

12: pocc.exe -utf-8 test.c
test.c(5): fatal error #1065: Failed converting input from 'U'.
Title: Re: UTF8 issue on Windows 7
Post by: Pelle on September 10, 2023, 07:50:03 PM
The /utf8 option means the execution character set and source character set is UTF-8, i.e. a source file without a BOM must be UTF-8 (7-bit ASCII is a subset of UTF-8 so will work, "exotic" ANSI characters will not work, etc.)

I don't have a Win7 machine for a quick test right now, I will see if it's possible to set up. ..
Title: Re: UTF8 issue on Windows 7
Post by: John Z on September 13, 2023, 03:01:55 PM
OK interesting .....

The /utf8 option means the execution character set and source character set is UTF-8, i.e. a source file without a BOM must be UTF-8 (7-bit ASCII is a subset of UTF-8 so will work, "exotic" ANSI characters will not work, etc.)

I don't have a Win7 machine for a quick test right now, I will see if it's possible to set up. ..

I note that the above mentions /utf8 not /utf-8 so I tried that on WIN7 home with PellesC v12.002 the object file was created and there was NO error message so next I made up a random fake command switch and I get the "unknown option" error as I should, and no obj file is created.

Then I tried /utf no error, /ut no error, /u shows switches list (and is valid switch to undefine pp symbol, although should be uppercase).  as long as /ut is there anything can be after it like /utf99999 with no error....

SO with the test.c file one can't really tell if utf-8 is working because all characters are 7bit ASCII which are valid so we should try with valid unique to UTF-8 and/or BOM however it appears that /utf8 or just /ut is/are valid switch input(s) and adequate for windows 7 and Pelles C version 12.00.02 with pocc version 12.0.1.0

I'm not saying it makes sense  :P  Going to test more ....

John Z
Title: Re: UTF8 issue on Windows 7
Post by: Pelle on September 26, 2023, 06:39:16 PM
Then I tried /utf no error, /ut no error, /u shows switches list (and is valid switch to undefine pp symbol, although should be uppercase).  as long as /ut is there anything can be after it like /utf99999 with no error....
Good catch! It's /utf-8, but a missing break makes invalid cases like /utf8 fall into the following option case, were it will silently be "handled" (as in not casing a diagnostic) ...

About the original problem: I managed to boot up (literally) an old inherited laptop with Windows 7. The problem boiled down to the horror show of an API function called WideCharToMultiByte(); the flags and other parameters must match the Windows version (and the online documentation isn't exactly helpful) ...