UTF-8 identifiers, now supported by Clang and GCC 10 would be nice. Code below compiles and executes as expected with Martin Storsjö's LLVM-MinGW
char * Όνομα_μήνα (int μετρητής)
{
static const PCHAR DATA[]=
{
"ΦΟΙΝΙΚΑΙΟΣ","ΚΡΑΝΕΙΟΣ","ΛΑΝΟΤΡΟΠΙΟΣ","ΜΑΧΑΝΕΥΣ",
"ΔΩΔΕΚΑΤΕΥΣ","ΕΥΚΛΕΙΟΣ","ΑΡΤΕΜΙΣΙΟΣ","ΨΥΔΡΕΥΣ",
"ΓΑΜΕΙΛΙΟΣ","ΑΓΡΙΑΝΙΟΣ","ΠΑΝΑΜΟΣ","ΑΠΕΛΛΑΙΟΣ"
};
Well, using the u8 string prefix should work (at least with the source file in an encoding like UTF-16, on my machine) ...
char *Όνομα_μήνα(int μετρητής)
{
/*static */ const char *DATA[] =
{
u8"ΦΟΙΝΙΚΑΙΟΣ", u8"ΚΡΑΝΕΙΟΣ", u8"ΛΑΝΟΤΡΟΠΙΟΣ", u8"ΜΑΧΑΝΕΥΣ",
u8"ΔΩΔΕΚΑΤΕΥΣ", u8"ΕΥΚΛΕΙΟΣ", u8"ΑΡΤΕΜΙΣΙΟΣ", u8"ΨΥΔΡΕΥΣ",
u8"ΓΑΜΕΙΛΙΟΣ", u8"ΑΓΡΙΑΝΙΟΣ", u8"ΠΑΝΑΜΟΣ", u8"ΑΠΕΛΛΑΙΟΣ"
};
}
Otherwise I think there will be problems with Microsoft/Windows compatibility. I'm not sure it's 100%, but the current behavior seems to match MSVC.
Quote from: Pelle on February 22, 2021, 07:59:18 AM
Well, using the u8 string prefix should work (at least with the source file in an encoding like UTF-16, on my machine) ...
char *Όνομα_μήνα(int μετρητής)
{
/*static */ const char *DATA[] =
{
u8"ΦΟΙΝΙΚΑΙΟΣ", u8"ΚΡΑΝΕΙΟΣ", u8"ΛΑΝΟΤΡΟΠΙΟΣ", u8"ΜΑΧΑΝΕΥΣ",
u8"ΔΩΔΕΚΑΤΕΥΣ", u8"ΕΥΚΛΕΙΟΣ", u8"ΑΡΤΕΜΙΣΙΟΣ", u8"ΨΥΔΡΕΥΣ",
u8"ΓΑΜΕΙΛΙΟΣ", u8"ΑΓΡΙΑΝΙΟΣ", u8"ΠΑΝΑΜΟΣ", u8"ΑΠΕΛΛΑΙΟΣ"
};
}
Otherwise I think there will be problems with Microsoft/Windows compatibility. I'm not sure it's 100%, but the current behavior seems to match MSVC.
Hi Pelle:
I was refering to the identifiers, the names of variables, types, functions, labels etc. In the example I posted, this part
char *Όνομα_μήνα(int μετρητής)
From ISO/IEC 9899:202x, Annex D (normative) Universal character names for identifiers
Quote
Annex D
(normative)
Universal character names for identifiers
1 This clause lists the hexadecimal code values that are valid in universal character names in identifiers.
D.1 Ranges of characters allowed
1 00A8, 00AA, 00AD, 00AF, 00B2–00B5, 00B7–00BA, 00BC–00BE, 00C0–00D6, 00D8–00F6, 00F8–00FF
2 0100–167F, 1681–180D, 180F–1FFF
3 200B–200D, 202A–202E, 203F–2040, 2054, 2060–206F
4 2070–218F, 2460–24FF, 2776–2793, 2C00–2DFF, 2E80–2FFF
5 3004–3007, 3021–302F, 3031–303F
6 3040–D7FF
7 F900–FD3D, FD40–FDCF, FDF0–FE44, FE47–FFFD
8 10000–1FFFD, 20000–2FFFD, 30000–3FFFD, 40000–4FFFD, 50000–5FFFD, 60000–6FFFD, 70000–
7FFFD, 80000–8FFFD, 90000–9FFFD, A0000–AFFFD, B0000–BFFFD, C0000–CFFFD, D0000–DFFFD,
E0000–EFFFD
D.2 Ranges of characters disallowed initially
1 0300–036F, 1DC0–1DFF, 20D0–20FF, FE20–FE2F
Martin Storsjö's LLVM-MinGW has implemented this and I have used it on Windows. I believe that Martin also has also done this on the latest MinGW64.
Microsoft C/C++ identifiers are still ASCII
Quote
nondigit: one of
_ a b c d e f g h i j k l mn o p q r s t u v w x y z
A B C D E F G H I J K L MN O P Q R S T U V W X Y Z
digit: one of
0 1 2 3 4 5 6 7 8 9
quoted from
Quotehttps://docs.microsoft.com/en-us/cpp/c-language/c-identifiers?view=msvc-160
but in general moving toward UTF-8 and away from UTF-16.
Quote
-A vs. -W APIs
Win32 APIs often support both -A and -W variants.
-A variants recognize the ANSI code page configured on the system and support char*, while -W variants operate in UTF-16 and support WCHAR.
Until recently, Windows has emphasized "Unicode" -W variants over -A APIs. However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. If the ANSI code page is configured for UTF-8, -A APIs operate in UTF-8. This model has the benefit of supporting existing code built with -A APIs without any code changes.
Quoted from
https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page (https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page)
Quote from: Robert on February 22, 2021, 09:48:41 PM
I was refering to the identifiers, the names of variables, types, functions, labels etc. In the example I posted, this part
...
Ah! OK ...
Quote from: Robert on February 22, 2021, 09:48:41 PM
...
but in general moving toward UTF-8 and away from UTF-16.
I wasn't aware of this. I will look at it... (but can't promise anything right now).
I guess the standard C way of using "universal-character-names" should work...
\uxxxx (xxxx = four hex digits)
\Uxxxxxxxx (xxxxxxxx = eight hex digits)
... but it get tedious rather quickly...
Quote from: Pelle on February 23, 2021, 07:24:50 AM
Quote from: Robert on February 22, 2021, 09:48:41 PM
I was refering to the identifiers, the names of variables, types, functions, labels etc. In the example I posted, this part
...
Ah! OK ...
Quote from: Robert on February 22, 2021, 09:48:41 PM
...
but in general moving toward UTF-8 and away from UTF-16.
I wasn't aware of this. I will look at it... (but can't promise anything right now).
I guess the standard C way of using "universal-character-names" should work...
\uxxxx (xxxx = four hex digits)
\Uxxxxxxxx (xxxxxxxx = eight hex digits)
... but it get tedious rather quickly...
My head hurts just thinking about the standard C way of using "universal-character-names" !
Your IDE already is UTF-8 default so it would be nice to add a level of sophistication and accessibility for non-ASCII coders.
Thank you Pelle.
Good news: it wasn't too hard adding a new compiler option (/utf-8) that switches from the default ANSI code page (both for runtime, and source files without a BOM). Will be in the next version.
Quote from: Pelle on February 24, 2021, 05:24:19 PM
Good news: it wasn't too hard adding a new compiler option (/utf-8) that switches from the default ANSI code page (both for runtime, and source files without a BOM). Will be in the next version.
شكرا لك
આભાર
Баярлалаа
Cảm ơn bạn
謝謝
Thank you
Quote from: Robert on February 24, 2021, 10:08:13 PM
Quote from: Pelle on February 24, 2021, 05:24:19 PM
Good news: it wasn't too hard adding a new compiler option (/utf-8) that switches from the default ANSI code page (both for runtime, and source files without a BOM). Will be in the next version.
شكرا لك
આભાર
Баярлалаа
Cảm ơn bạn
謝謝
Thank you
+1Use /utf-8 flag on pocc 11.0 compiler command line.
#include <windows.h>
#include <stdio.h> // ISO StdLib
#include <stdlib.h> // ISO StdLib
#include <conio.h> // Πρωτόγονη είσοδος / έξοδος
// *************************************************
// Καθολικές μεταβλητές χρηστών
// *************************************************
static int αρχικός_Κώδικας_σελίδα;
// *************************************************
// Πρωτότυπα χρήστη
// *************************************************
char* Όνομα_μήνα (int);
char* εργάσιμες (int);
// *************************************************
// Διαδικασίες χρήστη
// *************************************************
char * Όνομα_μήνα (int μετρητής)
{
static char* στοιχεία[]=
{
// The Antikythera mechanism, the oldest example of an analogue computer, has the
// following 12 month names of the Corinthian calendar inscribed on the Metonic dial.
// https://en.wikipedia.org/wiki/Antikythera_mechanism
"ΦΟΙΝΙΚΑΙΟΣ","ΚΡΑΝΕΙΟΣ","ΛΑΝΟΤΡΟΠΙΟΣ","ΜΑΧΑΝΕΥΣ",
"ΔΩΔΕΚΑΤΕΥΣ","ΕΥΚΛΕΙΟΣ","ΑΡΤΕΜΙΣΙΟΣ","ΨΥΔΡΕΥΣ",
"ΓΑΜΕΙΛΙΟΣ","ΑΓΡΙΑΝΙΟΣ","ΠΑΝΑΜΟΣ","ΑΠΕΛΛΑΙΟΣ"
};
if(μετρητής<1||μετρητής>12 )
{
return 0;
}
return στοιχεία[μετρητής-1];
}
char* εργάσιμες (int μετρητής)
{
static char* στοιχεία[]=
{
"Κυριακή","Δευτέρα","Τρίτη","Τετάρτη",
"Πέμπτη","Παρασκευή","Σάββατο"
};
return στοιχεία[μετρητής-1];
}
// *************************************************
// Κύριο πρόγραμμα
// *************************************************
int main(int argc, char *argv[])
{
αρχικός_Κώδικας_σελίδα=GetConsoleOutputCP();
SetConsoleOutputCP(65001);
printf("%s\n","Εδώ είναι τα ονόματα του μήνα:");
printf("\n");
{int ιώτα;
for(ιώτα=1; ιώτα<=12; ιώτα+=1)
{
printf("%s\n",Όνομα_μήνα(ιώτα));
}
}
printf("\n");
printf("%s\n","Εδώ είναι τα ονόματα των ημερών της εβδομάδας:");
printf("\n");
{int ιώτα;
for(ιώτα=1; ιώτα<=7; ιώτα+=1)
{
printf("%s\n",εργάσιμες(ιώτα));
}
}
SetConsoleOutputCP(αρχικός_Κώδικας_σελίδα);
printf("\n%s\n","Πατήστε οποιοδήποτε κουμπί για να συνεχίσετε . . .");
_getch();
return EXIT_SUCCESS; // Τέλος του κύριου προγράμματος
}
Result:
Εδώ είναι τα ονόματα του μήνα:
ΦΟΙΝΙΚΑΙΟΣ
ΚΡΑΝΕΙΟΣ
ΛΑΝΟΤΡΟΠΙΟΣ
ΜΑΧΑΝΕΥΣ
ΔΩΔΕΚΑΤΕΥΣ
ΕΥΚΛΕΙΟΣ
ΑΡΤΕΜΙΣΙΟΣ
ΨΥΔΡΕΥΣ
ΓΑΜΕΙΛΙΟΣ
ΑΓΡΙΑΝΙΟΣ
ΠΑΝΑΜΟΣ
ΑΠΕΛΛΑΙΟΣ
Εδώ είναι τα ονόματα των ημερών της εβδομάδας:
Κυριακή
Δευτέρα
Τρίτη
Τετάρτη
Πέμπτη
Παρασκευή
Σάββατο
Πατήστε οποιοδήποτε κουμπί για να συνεχίσετε . . .