Pelles C forum

Pelles C => Bug reports => Topic started by: Robert on January 16, 2026, 06:48:39 PM

Title: IDE Reload. Eroteme Replaces Utf-8 !
Post by: Robert on January 16, 2026, 06:48:39 PM

When the poide.exe IDE is shut down and restarted, the UTF-8 in this code


#include <stdio.h>

/* entry point */
int main(void)
{
  printf("Hello, world!\n");
  printf("Салом Ҷаҳон!\n");
 
  return 0;
}


is reloaded to the poide.exe IDE, with question marks replacing the UTF-8, as


#include <stdio.h>

/* entry point */
int main(void)
{
  printf("Hello, world!\n");
  printf("????? ?????!!\n");
 
  return 0;
}


Tofu
https://fonts.google.com/knowledge/glossary/tofu (https://fonts.google.com/knowledge/glossary/tofu)

Eroteme
https://en.wiktionary.org/wiki/eroteme
https://en.wikipedia.org/wiki/Question_mark (https://en.wikipedia.org/wiki/Question_mark)
Title: Re: IDE Reload. Eroteme Replaces Utf-8 !
Post by: John Z on January 16, 2026, 11:06:59 PM
Hi Robert,

This is not really a bug.  It maybe a minor inconvenience but here is the situation as I understand it.

Pelle C was originally ASCII/ANSI for all source files.
Pelle C converted to having UTF-8 the default for all source files. 

It also supports UTF-16 for source files. When you create a new source file within the IDE it is automatically UTF-8.  You will see that the source file tab also shows UTF-8 (or UTF-16).  If it shows nothing but the name the source file is at best ASCII/ANSI.  When using 'OLD' source code or creating the source code file outside of Pelle C with a plain text editor it will be ASCII/ANSI

Now the critical part is that the editor now works in UTF-8 by default always.  This allows the editor to enter UTF-8 in the source code page, but since that page is not identified as UTF-8 when reloaded it will fail to display as expected.

So the Export64 program for example does not show UTF-8 in the tab so it is still ASCII/ANSI, even though the editor can make the 'display' show the character.

Using any editor that supports UTF-8 a source file can be created or just resaved saved with the encoding set to UTF-8.

I use TextPad for example to resave Export.c to Export_UTF8.c and if you add it to the Export64 program you will see the source tab shows the encoding.  If you run your test on this file it should 'pass' reloading -

Hope this was at least a little bit clear -

John Z

The other method is to create a blank source file in the IDE then paste in the old source code. When saved it will be UTF-8



Title: Re: IDE Reload. Eroteme Replaces Utf-8 !
Post by: Robert on January 17, 2026, 12:14:48 AM
Thanks John Z.

The file, created in EditPad, is initially a No-BOM UTF-8 file with only ASCII characters.
The file then is modified in poide.exe IDE adding UTF-8 glyphs beyond U+00FF.
The file is saved.
When opened in EditPad the file is reported as Windows 1252.
When re-opened in poide.exe, the UTF-8 glyphs beyond U+00FF have been replaced with erotemes.

If a No-BOM UTF-8 file, with at least one beyond U+00FF glyph, is initially loaded into poide.exe, then the file will be saved as UTF-8 No-BOM.

I will have to remember that.

Title: Re: IDE Reload. Eroteme Replaces Utf-8 !
Post by: TimoVJL on January 17, 2026, 10:31:59 AM
C compilers:
Pelles CHelloBug.c(8): warning #2223: Unable to convert character '\u0421' to 'ANSI'; using default character.pFile      00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F    Value          
0000010F    3F 3F 3F 3F 3F 20 3F 3F  3F 3F 3F 21 0A 00 48 65    ????? ?????!..He
0000011F    6C 6C 6F 2C 20 77 6F 72  6C 64 21 0A 00     llo, world!..
msvcHelloBug.c(8): warning C4566: character represented by universal-character-name '\u0421' cannot be represented in the current code page (1252)pFile      00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F    Value          
000001CC    48 65 6C 6C 6F 2C 20 77  6F 72 6C 64 21 0A 00 00    Hello, world!...
000001DC    3F 3F 3F 3F 3F 20 3F 3F  3F 3F 3F 21 0A 00     ????? ?????!..
Clang-cl
pFile      00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F    Value          
00000121    48 65 6C 6C 6F 2C 20 77  6F 72 6C 64 21 00 D0 A1    Hello, world!.С
00000131    D0 B0 D0 BB D0 BE D0 BC  20 D2 B6 D0 B0 D2 B3 D0    алом ҶаҳÐ
00000141    BE D0 BD 21 00     ¾Ð½!.
Title: Re: IDE Reload. Eroteme Replaces Utf-8 !
Post by: Robert on January 18, 2026, 09:36:59 AM
Quote from: TimoVJL on Yesterday at 10:31:59 AMC compilers:
Pelles C
HelloBug.c(8): warning #2223: Unable to convert character '\u0421' to 'ANSI'; using default character.pFile      00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F    Value         
0000010F    3F 3F 3F 3F 3F 20 3F 3F  3F 3F 3F 21 0A 00 48 65    ????? ?????!..He
0000011F    6C 6C 6F 2C 20 77 6F 72  6C 64 21 0A 00    llo, world!..
msvc
HelloBug.c(8): warning C4566: character represented by universal-character-name '\u0421' cannot be represented in the current code page (1252)pFile      00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F    Value         
000001CC    48 65 6C 6C 6F 2C 20 77  6F 72 6C 64 21 0A 00 00    Hello, world!...
000001DC    3F 3F 3F 3F 3F 20 3F 3F  3F 3F 3F 21 0A 00    ????? ?????!..
Clang-cl
pFile      00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F    Value         
00000121    48 65 6C 6C 6F 2C 20 77  6F 72 6C 64 21 00 D0 A1    Hello, world!.С
00000131    D0 B0 D0 BB D0 BE D0 BC  20 D2 B6 D0 B0 D2 B3 D0    алом ҶаҳÐ
00000141    BE D0 BD 21 00    ¾Ð½!.


Hi Timo:

Is the HelloBug.c file, referenced above, the main.c file that is enclosed in the Skat.zip appended to the post at https://forum.pellesc.de/index.php?topic=471.msg41856#msg41856 (https://forum.pellesc.de/index.php?topic=471.msg41856#msg41856)?

If it is, that version of main.c is encoded in UTF-16LE.

If it isn't, then what is HelloBug.c ?

Here is main.c encoded in UTF-8 No BOM.


#include <windows.h>
#include <stdio.h>

static int OrigCodePage;
static const char* σκατ;
static const char* δυσκατανοήτων;

int main(int argc, char* argv[])
{
    OrigCodePage = GetConsoleOutputCP();
    SetConsoleOutputCP(65001);
    σκατ = "σκατ doo, be, shoo, bop, ooh, dee, doo, sha-bam";
    δυσκατανοήτων = "δυσκατανοήτων difficult to understand";
    printf("%s%s%s\n", σκατ, " ", δυσκατανοήτων);
    _getch();
    SetConsoleOutputCP(OrigCodePage);
    return 1;
}

Title: Re: IDE Reload. Eroteme Replaces Utf-8 !
Post by: TimoVJL on January 18, 2026, 10:09:32 AM
I used UTF-8 with BOM

#define WIN32_LEAN_AND_MEAN
#include <windows.h>

void __cdecl WinMainCRTStartup(void)
{
    char s[100] = u8"σκατ";
    wchar_t w[20];
    wsprintf(s+8, " = %02X %02X %02X %02X %02X %02X %02X %02X",
        (BYTE)s[0], (BYTE)s[1], (BYTE)s[2], (BYTE)s[3],
        (BYTE)s[4], (BYTE)s[5], (BYTE)s[6], (BYTE)s[7]);
    MessageBox(0, s, "test", MB_OK);
    //if (MultiByteToWideChar(CP_UTF8, MB_COMPOSITE, s, -1, w, 20))
    //if (MultiByteToWideChar(CP_UTF8, MB_PRECOMPOSED, s, -1, w, 20))
    //if (MultiByteToWideChar(CP_UTF8, MB_USEGLYPHCHARS, s, -1, w, 20))
    if (MultiByteToWideChar(CP_UTF8, 0, s, -1, w, 20))
        MessageBoxW(0, w, L"test", MB_OK);
    else MessageBox(0, "error", "test", MB_OK|MB_ICONERROR);
    ExitProcess(0);
}