When the poide.exe IDE is shut down and restarted, the UTF-8 in this code
#include <stdio.h>
/* entry point */
int main(void)
{
printf("Hello, world!\n");
printf("Салом Ҷаҳон!\n");
return 0;
}
is reloaded to the poide.exe IDE, with question marks replacing the UTF-8, as
#include <stdio.h>
/* entry point */
int main(void)
{
printf("Hello, world!\n");
printf("????? ?????!!\n");
return 0;
}
Tofu
https://fonts.google.com/knowledge/glossary/tofu (https://fonts.google.com/knowledge/glossary/tofu)
Eroteme
https://en.wiktionary.org/wiki/eroteme
https://en.wikipedia.org/wiki/Question_mark (https://en.wikipedia.org/wiki/Question_mark)
Hi Robert,
This is not really a bug. It maybe a minor inconvenience but here is the situation as I understand it.
Pelle C was originally ASCII/ANSI for all source files.
Pelle C converted to having UTF-8 the default for all source files.
It also supports UTF-16 for source files. When you create a new source file within the IDE it is automatically UTF-8. You will see that the source file tab also shows UTF-8 (or UTF-16). If it shows nothing but the name the source file is at best ASCII/ANSI. When using 'OLD' source code or creating the source code file outside of Pelle C with a plain text editor it will be ASCII/ANSI
Now the critical part is that the editor now works in UTF-8 by default always. This allows the editor to enter UTF-8 in the source code page, but since that page is not identified as UTF-8 when reloaded it will fail to display as expected.
So the Export64 program for example does not show UTF-8 in the tab so it is still ASCII/ANSI, even though the editor can make the 'display' show the character.
Using any editor that supports UTF-8 a source file can be created or just resaved saved with the encoding set to UTF-8.
I use TextPad for example to resave Export.c to Export_UTF8.c and if you add it to the Export64 program you will see the source tab shows the encoding. If you run your test on this file it should 'pass' reloading -
Hope this was at least a little bit clear -
John Z
The other method is to create a blank source file in the IDE then paste in the old source code. When saved it will be UTF-8
Thanks John Z.
The file, created in EditPad, is initially a No-BOM UTF-8 file with only ASCII characters.
The file then is modified in poide.exe IDE adding UTF-8 glyphs beyond U+00FF.
The file is saved.
When opened in EditPad the file is reported as Windows 1252.
When re-opened in poide.exe, the UTF-8 glyphs beyond U+00FF have been replaced with erotemes.
If a No-BOM UTF-8 file, with at least one beyond U+00FF glyph, is initially loaded into poide.exe, then the file will be saved as UTF-8 No-BOM.
I will have to remember that.
C compilers:
Pelles CHelloBug.c(8): warning #2223: Unable to convert character '\u0421' to 'ANSI'; using default character.pFile 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F Value
0000010F 3F 3F 3F 3F 3F 20 3F 3F 3F 3F 3F 21 0A 00 48 65 ????? ?????!..He
0000011F 6C 6C 6F 2C 20 77 6F 72 6C 64 21 0A 00 llo, world!..
msvcHelloBug.c(8): warning C4566: character represented by universal-character-name '\u0421' cannot be represented in the current code page (1252)pFile 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F Value
000001CC 48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 0A 00 00 Hello, world!...
000001DC 3F 3F 3F 3F 3F 20 3F 3F 3F 3F 3F 21 0A 00 ????? ?????!..
Clang-cl
pFile 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F Value
00000121 48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 00 D0 A1 Hello, world!.С
00000131 D0 B0 D0 BB D0 BE D0 BC 20 D2 B6 D0 B0 D2 B3 D0 алом ҶаҳÐ
00000141 BE D0 BD 21 00 ¾Ð½!.
Quote from: TimoVJL on Yesterday at 10:31:59 AMC compilers:
Pelles C
HelloBug.c(8): warning #2223: Unable to convert character '\u0421' to 'ANSI'; using default character.pFile 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F Value
0000010F 3F 3F 3F 3F 3F 20 3F 3F 3F 3F 3F 21 0A 00 48 65 ????? ?????!..He
0000011F 6C 6C 6F 2C 20 77 6F 72 6C 64 21 0A 00 llo, world!..
msvc
HelloBug.c(8): warning C4566: character represented by universal-character-name '\u0421' cannot be represented in the current code page (1252)pFile 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F Value
000001CC 48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 0A 00 00 Hello, world!...
000001DC 3F 3F 3F 3F 3F 20 3F 3F 3F 3F 3F 21 0A 00 ????? ?????!..
Clang-cl
pFile 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F Value
00000121 48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 00 D0 A1 Hello, world!.С
00000131 D0 B0 D0 BB D0 BE D0 BC 20 D2 B6 D0 B0 D2 B3 D0 алом ҶаҳÐ
00000141 BE D0 BD 21 00 ¾Ð½!.
Hi Timo:
Is the HelloBug.c file, referenced above, the main.c file that is enclosed in the Skat.zip appended to the post at https://forum.pellesc.de/index.php?topic=471.msg41856#msg41856 (https://forum.pellesc.de/index.php?topic=471.msg41856#msg41856)?
If it is, that version of main.c is encoded in UTF-16LE.
If it isn't, then what is HelloBug.c ?
Here is main.c encoded in UTF-8 No BOM.
#include <windows.h>
#include <stdio.h>
static int OrigCodePage;
static const char* σκατ;
static const char* δυσκατανοήτων;
int main(int argc, char* argv[])
{
OrigCodePage = GetConsoleOutputCP();
SetConsoleOutputCP(65001);
σκατ = "σκατ doo, be, shoo, bop, ooh, dee, doo, sha-bam";
δυσκατανοήτων = "δυσκατανοήτων difficult to understand";
printf("%s%s%s\n", σκατ, " ", δυσκατανοήτων);
_getch();
SetConsoleOutputCP(OrigCodePage);
return 1;
}
I used UTF-8 with BOM
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
void __cdecl WinMainCRTStartup(void)
{
char s[100] = u8"σκατ";
wchar_t w[20];
wsprintf(s+8, " = %02X %02X %02X %02X %02X %02X %02X %02X",
(BYTE)s[0], (BYTE)s[1], (BYTE)s[2], (BYTE)s[3],
(BYTE)s[4], (BYTE)s[5], (BYTE)s[6], (BYTE)s[7]);
MessageBox(0, s, "test", MB_OK);
//if (MultiByteToWideChar(CP_UTF8, MB_COMPOSITE, s, -1, w, 20))
//if (MultiByteToWideChar(CP_UTF8, MB_PRECOMPOSED, s, -1, w, 20))
//if (MultiByteToWideChar(CP_UTF8, MB_USEGLYPHCHARS, s, -1, w, 20))
if (MultiByteToWideChar(CP_UTF8, 0, s, -1, w, 20))
MessageBoxW(0, w, L"test", MB_OK);
else MessageBox(0, "error", "test", MB_OK|MB_ICONERROR);
ExitProcess(0);
}