The attached Add-In can be used to export C source code as a HTML or PDF file - with syntax color highlighting. Maybe useful to someone...
Tested with 3.00 Beta, but will probably work with 2.90 too...
New Mar 18: Added export to PDF files...
New Mar 22: Added compression to PDF files... - thanks Timppa!
- Use Export.ppj for no compression.
- Use ExportZ.ppj for compression - in this case you also need ZLIB 1.2.2 (http://www.zlib.net) in the subdirectory zlib-1.2.2.
Pelle
Very nice work
Hello Pelles,
Very useful program :)
BTW, in the same taste, a C to XML would be nice :)
Quote from: "Vortex"Very nice work
Thanks, Vortex!
Pelle
Hello,
Quote from: "Gerome"Very useful program :)
BTW, in the same taste, a C to XML would be nice :)
Thanks!
I'm not sure about the best representation/format for XML. Do you have an example, or know where I can find it...?
Export to PDF files would be nice to, but I'm not sure how to do that either.
Pelle
Hi,
XML ?
Yes i have ideas :)
<PROJECT>
<FILENAME>MySample.c</FILENAME>
<BEFOREFOOS>#include + declares + ...</BEFOREFOOS>
<FOOS>
<NAME>specialfoo</NAME>
<BODY>void specialfoo(void)...</BODY>
...
</FOOS>
<FOOS>
<NAME>otherfoo</NAME>
<BODY>int *otherfoo(void)...</BODY>
...
</FOOS>
</PROJECT>
would be a good start ? :)
After that, one can imagine making easy documentation and/or snippet database combined with an XSD layer it can be terribly useful :)
Quote from: "Pelle"Hello,
Quote from: "Gerome"Very useful program :)
BTW, in the same taste, a C to XML would be nice :)
Thanks!
I'm not sure about the best representation/format for XML. Do you have an example, or know where I can find it...?
Export to PDF files would be nice to, but I'm not sure how to do that either.
Pelle
OK, but this requires a different parser that better understands various C elements, like functions. Maybe some day...
Pelle
Quote from: "Pelle"
Export to PDF files would be nice to, but I'm not sure how to do that either.
If you check out SciTE, it has an export to PDF function (in SciTE itself, not Scintilla), although I'm not sure if the export maintains highlighting. But it might be a start to get you going.
http://scintilla.sourceforge.net/
Good luck!
Quote from: "Justin Thyme"If you check out SciTE, it has an export to PDF function (in SciTE itself, not Scintilla), although I'm not sure if the export maintains highlighting. But it might be a start to get you going.
Thanks - I will look at it. I found a reference manual at Adobe. I got it working in 'hack-ish' way - need to make it more reusable...
Pelle
Hi Pelle:
I have modified your C to HTML converter addin so that the size of the exported HTML is significantly decreased. I enclosed the exported <body> .. </body> with <pre> ... </pre>, allowing <br> to be discarded, and replaced "class" with "id", and replaced "span" with "code" allowing " " to be replaced with a space.
I have only used the output with I.E. 6.0 so programmers using other browsers may have issues with the HTML export.
Merry Christmas to all !
Robert Wishlaw
Hello Robert,
Cool - thanks!
...and Merry Christmas...!
Pelle
Great thing, could be very usefull - thx
Merry christmas!
Quote from: "Robert"Hi Pelle:
I have modified your C to HTML converter addin so that the size of the exported HTML is significantly decreased. I enclosed the exported <body> .. </body> with <pre> ... </pre>, allowing <br> to be discarded, and replaced "class" with "id", and replaced "span" with "code" allowing " " to be replaced with a space.
I have only used the output with I.E. 6.0 so programmers using other browsers may have issues with the HTML export.
Merry Christmas to all !
Robert Wishlaw
Attached is a revised version of Pelle's C to HTML converter in which the "id" selector has been reverted to "class".
Although "id" seems to work, according to the CSS 2.0 standard, it is not meant to be used in more than one element instance and so multiple instances are tagged as non-compliant in strict type checking editors.
This non-compliance will probably cause problems in the future as the browsers become more compliant to the CSS 2.0 standard.
Robert Wishlaw
Is it ok to include exportToHTML2.zip on my web site ?
John
Hi John:
Yes, go ahead, I have no objections.
Robert Wishlaw
Quote from: "Robert"Hi John:
Yes, go ahead, I have no objections.
Robert Wishlaw
Thanks.
John
Hi Pelle!
When I try to compile this ADDIN using Pelles C 5.0 BETA I get the following error:
Quote
Building Export.obj.
C:\Arquivos de programas\PellesC\Include\addin.h(987): warning #2099: Missing type specifier.
C:\Arquivos de programas\PellesC\Include\addin.h(987): error #2001: Syntax error: expected ';' but found 'ADDIN_FIND_IN_FILES'.
C:\Arquivos de programas\PellesC\Include\addin.h(987): warning #2099: Missing type specifier.
*** Error code: 1 ***
Done.
I guess there is a little bug in addin.h?
Thanks
Updated to support form feed / page break too and RTF.
EDIT 2013-10-09: fix for CRLF (for Win7 WordPad?)
EDIT 2014-02-01: fix RTF font name.
Hi Timovjl:
Thank you for the update.
Robert Wishlaw
Looking for some information on creating PDF files I found an old add-in project created by multiple authors Pelle, Timo, and Robert, way back in 2005 for Pelles C version 3 (I think), then updated and added RTF by Timo in 2013. https://forum.pellesc.de/index.php?topic=471.15
Of course it no longer worked with the newer version of Pelle C version 13.00.9 - Soooo I've minimally hacked it to get it functional for the current version. While it will now work with plain text, UTF-8, and UTF-16 source pages it will only accurately produce output if the text code point is within the ANSI space. This is OK for source code but some comments won't be displayed correctly when non-ANSI characters are used. Could be fixed too but not sure it would be worth the effort.
Project ZIP include everything for a 64 bit version.
John Z
Quote from: John Z on January 15, 2026, 12:26:58 PMLooking for some information on creating PDF files I found an old add-in project created by multiple authors Pelle, Timo, and Robert, way back in 2005 for Pelles C version 3 (I think), then updated and added RTF by Timo in 2013. https://forum.pellesc.de/index.php?topic=471.15
Of course it no longer worked with the newer version of Pelle C version 13.00.9 - Soooo I've minimally hacked it to get it functional for the current version. While it will now work with plain text, UTF-8, and UTF-16 source pages it will only accurately produce output if the text code point is within the ANSI space. This is OK for source code but some comments won't be displayed correctly when non-ANSI characters are used. Could be fixed too but not sure it would be worth the effort.
Project ZIP include everything for a 64 bit version.
John Z
Hi John Z:
Wow! This takes me back to the "Realm of Long Long Ago".
About the HTML Export for UTF-8:
1. The HTML header requires more info for the page to render properly.
2. I think _setmode has to be used to get proper UTF-8 output. For details see
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170 (https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170)
Where can I get a zlib64.lib ?
Building Export64.dll.
POLINK: fatal error: File not found: 'zlib64.lib'.
*** Error code: 1 ***
Interesting.
Thanks.
Hi Robert,
My apologizes, I didn't check that the project zip was complete. I updated the post with a project zip that includes the libs.
I'll look further into it (UTF-8) with the link you provided.
Hopefully good memories for you :)
John Z
Quote from: John Z on January 16, 2026, 11:45:34 AMHi Robert,
My apologizes, I didn't check that the project zip was complete. I updated the post with a project zip that includes the libs.
I'll look further into it (UTF-8) with the link you provided.
Hopefully good memories for you :)
John Z
Hi John Z:
Thanks, the zlib64 is good. Export is as expected with the limitations you have mentioned regarding UTF-8.
There is a separate but maybe connected problem with Pelles poide.exe IDE.
This code
#include <stdio.h>
/* entry point */
int main(void)
{
printf("Hello, world!\n");
return 0;
}
amended by adding a print in Tajik UTF-8
#include <stdio.h>
/* entry point */
int main(void)
{
printf("Hello, world!\n");
printf("Салом Ҷаҳон!\n");
return 0;
}
is reloaded to the poide.exe IDE as
#include <stdio.h>
/* entry point */
int main(void)
{
printf("Hello, world!\n");
printf("????? ?????!!\n");
return 0;
}
when the IDE is shut down and restarted.
zlib 1.3.1 project with missing header file, that dependecies forgot.
zlib 1.3.1 Release Notes (https://github.com/madler/zlib/releases)
Quote from: TimoVJL on January 16, 2026, 06:34:08 PMzlib 1.3.1 project with missing header file, that dependecies forgot.
zlib 1.3.1 Release Notes (https://github.com/madler/zlib/releases)
Thanks Timo 8)
Hi John Z:
Very complex, very interesting.
I have had a look at Pelles and Timo's code comparing it with yours and have some ideas.
I will continue studying this and if I can get it to spit out anything intelligible, I'll let you know.
Hi John Z:
Yeah, well, uh ... we've got a bit of work in front of us if we hope to deal properly with non ASCII code.
This code
#include <windows.h>
#include <stdio.h>
static int OrigCodePage;
static const char* σκατ;
static const char* δυσκατανοήτων;
int main(int argc, char* argv[])
{
OrigCodePage = GetConsoleOutputCP();
SetConsoleOutputCP(65001);
σκατ = "σκατ doo, be, shoo, bop, ooh, dee, doo, sha-bam";
δυσκατανοήτων = "δυσκατανοήτων difficult to understand";
printf("%s%s%s\n", σκατ, " ", δυσκατανοήτων);
_getch();
SetConsoleOutputCP(OrigCodePage);
return 1;
}
Exports from poide.exe IDE as HTML file:
#include <windows.h>
#include <stdio.h>
static int OrigCodePage;
static const char* УКБФ;
static const char* ДХУКБФБНПЎФЩН;
int main(int argc, char* argv[])
{
OrigCodePage = GetConsoleOutputCP();
SetConsoleOutputCP(65001);
УКБФ = "УКБФ doo, be, shoo, bop, ooh, dee, doo, sha-bam";
ДХУКБФБНПЎФЩН = "ДХУКБФБНПЎФЩН difficult to understand";
printf("%s%s%s\n", УКБФ, " ", ДХУКБФБНПЎФЩН);
_getch();
SetConsoleOutputCP(OrigCodePage);
return 1;
}
Pelles C project code in attached file Skat.zip
Hi Robert,
Enjoy :) -
Both HTML and PDF formats do support UTF-8, and PDF formats are amenable to UTF-16 and others.
HTML can support UTF-16 but it is 'strongly' discouraged in the HTML5 spec so UTF-8 is mainstream for HTML.
It seems the biggest challenge is UTF-8/16 in RTF. All UTF-8 characters must be encoded using the \un? format.
So it can work, I just did the minimum to get it 'working' again.
While typing I see an update posted. I'll look at it more it seems your focus is a console program and not a windows program? If so somewhere I saw a full fledged console program for console unicode display - maybe I can find again ...
John Z
A small stupid C to RTF project to RichEdit.
It can help to debug some code.
It was created in Windows 7 SP1 x64 EN version.
As sourcecode is available, crash point might be determines.
It was also collected from working code examples.
Quote from: TimoVJL on January 17, 2026, 01:21:16 PMA small stupid C to RTF project to RichEdit.
It can help to debug some code.
Hi Timo:
The RTF output from this code is different from the RTF output of John Z's IDE addin.
I'm going to have a look at the output in an ImHex editor and see what's going on.
You know ImHex ?
https://imhex.werwolv.net/ (https://imhex.werwolv.net/)
https://github.com/WerWolv/ImHex (https://github.com/WerWolv/ImHex)
Quote from: Robert on January 18, 2026, 12:17:30 AMThe RTF output from this code is different from the RTF output of John Z's IDE addin.
It might just created for different purbose, like writing to RichEdit control.
Also it was for only ANSI source.
int printf(const char * restrict format, ...);
// https://stackoverflow.com/questions/5603559/one-file-lib-to-conv-utf8-char-to-wchar-t
short utf8_to_wchar(char **utf8)
{
short sz = 0;
short c;
char *p = *(char **)utf8;
char v = (*p);
if (v >= 0)
{
c = v;
sz += c;
++p; (*utf8)++;
}
int shiftCount = 0;
if ((v & 0xE0) == 0xC0)
{
shiftCount = 1;
c = v & 0x1F;
}
else if ((v & 0xF0) == 0xE0)
{
shiftCount = 2;
c = v & 0xF;
}
else
return 0;
++p; (*utf8)++;
while (shiftCount)
{
v = *p;
++p; (*utf8)++;
if ((v & 0xC0) != 0x80)
return 0;
c <<= 6;
c |= (v & 0x3F);
--shiftCount;
}
sz += c;
return sz;
}
int ShortToStrPos(int n, char *s)
{
int i, sign, idx, nl, len;
idx = 0;
/* if ((sign = n) < 0) { // record sign
n = -n; // make n positive
idx++;
}*/
i = 0;
nl = n;
while ((nl /= 10) > 0) /* count nums */
idx++;
len = idx+1;
s[idx+1] = '\0';
do { /* generate digits in reverse order */
s[idx--] = n % 10 + '0'; /* get next digit */
} while ((n /= 10) > 0); /* delete it */
// if (sign < 0)
// s[0] = '-';
return len;
}
int __cdecl main(void)
{
char utf8[] = u8"σκατ";
char *p = utf8;
while (*p) {
if (*(unsigned char*)p > 127) { // UTF8 ?
short uc = utf8_to_wchar(&p);
printf("%Xh\t", uc);
}
}
printf("\n%p\n%p\n", utf8, p);
return 0;
}
EDIT 2025-01-19: UNICODE version in RE_Test3 and esc close window, but still bugs
Hi Timo,
My apologies, it was my mistake. Your application works fine and I removed my previous message #41859
Quote from: TimoVJL on January 18, 2026, 05:00:36 AMQuote from: Robert on January 18, 2026, 12:17:30 AMThe RTF output from this code is different from the RTF output of John Z's IDE addin.
It might just created for different purbose, like writing to RichEdit control.
Also it was for only ANSI source.
int printf(const char * restrict format, ...);
// https://stackoverflow.com/questions/5603559/one-file-lib-to-conv-utf8-char-to-wchar-t
short utf8_to_wchar(char **utf8)
{
short sz = 0;
short c;
char *p = *(char **)utf8;
char v = (*p);
if (v >= 0)
{
c = v;
sz += c;
++p; (*utf8)++;
}
int shiftCount = 0;
if ((v & 0xE0) == 0xC0)
{
shiftCount = 1;
c = v & 0x1F;
}
else if ((v & 0xF0) == 0xE0)
{
shiftCount = 2;
c = v & 0xF;
}
else
return 0;
++p; (*utf8)++;
while (shiftCount)
{
v = *p;
++p; (*utf8)++;
if ((v & 0xC0) != 0x80)
return 0;
c <<= 6;
c |= (v & 0x3F);
--shiftCount;
}
sz += c;
return sz;
}
int ShortToStrPos(int n, char *s)
{
int i, sign, idx, nl, len;
idx = 0;
/* if ((sign = n) < 0) { // record sign
n = -n; // make n positive
idx++;
}*/
i = 0;
nl = n;
while ((nl /= 10) > 0) /* count nums */
idx++;
len = idx+1;
s[idx+1] = '\0';
do { /* generate digits in reverse order */
s[idx--] = n % 10 + '0'; /* get next digit */
} while ((n /= 10) > 0); /* delete it */
// if (sign < 0)
// s[0] = '-';
return len;
}
int __cdecl main(void)
{
char utf8[] = u8"σκατ";
char *p = utf8;
while (*p) {
if (*(unsigned char*)p > 127) { // UTF8 ?
short uc = utf8_to_wchar(&p);
printf("%Xh\t", uc);
}
}
printf("\n%p\n%p\n", utf8, p);
return 0;
}
EDIT 2025-01-19: UNICODE version in RE_Test3 and esc close window, but still bugs
Hei TimoVJL:
The code snippet above is interesting. Thanks.
What bugs ? I don't see bugs in RE_Test3 output.
Quote from: TimoVJL on January 18, 2026, 05:00:36 AM
....
if (*(unsigned char*)p > 127) { // UTF8 ?
....
Hei TimoVJL:
"Nearly all invalid UTF-8 cases can be detected by looking at the first two bytes of a character (in fact, the first 12 bits)."
Quoted from:
'Validating UTF-8 In Less Than One Instruction Per Byte'
available at
https://arxiv.org/pdf/2010.03090.pdf (https://arxiv.org/pdf/2010.03090.pdf)
See also:
Ridiculously fast unicode (UTF-8) validation (https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/)
Thanks again for the code.
Mikään ei ole mahdotonta.
QuoteRTF SYNTAX
An RTF file consists of unformatted text, control words, control symbols, and groups. For ease of transport, a standard RTF file can consist of only 7-bit ASCII characters. (Converters that communicate with Microsoft Word for Windows or Microsoft Word for the Macintosh should expect 8-bit characters.) There is no set maximum line length for an RTF file.
RTF use ASCII 32 - 127 chars and some latin-1 (ISO/IEC 8859) chars without coding.
So i was just lazy for checking chars like many others.
UTF-8 with BOM can have conditional processing.
Quote from: TimoVJL on January 21, 2026, 07:14:06 AMQuoteRTF SYNTAX
An RTF file consists of unformatted text, control words, control symbols, and groups. For ease of transport, a standard RTF file can consist of only 7-bit ASCII characters. (Converters that communicate with Microsoft Word for Windows or Microsoft Word for the Macintosh should expect 8-bit characters.) There is no set maximum line length for an RTF file.
RTF use ASCII 32 - 127 chars and some latin-1 (ISO/IEC 8859) chars without coding.
So i was just lazy for checking chars like many others.
UTF-8 with BOM can have conditional processing.
Hi TimoVJL and John Z:
RTF SYNTAX.
Oh that !
Yeah, well, I think I'm begining to remember why I'm here.
Export C source etc.
Anyway, you solved what I had considered the hard part, that is, dealing with the UTF-16LE text which is what the export addin function AddIn_GetSourceTextW has to process.
However,as John Z mentioned and you yelled "RTF SYNTAX" I had to look and see what was expected from
static const char* σκατ;and saw that it was an RTF encoding of
\par }{\rtlch\fcs1 \af67 \ltrch\fcs0 \f67\insrsid15157589\charrsid15157589 static const char* \'f3\'ea\'e1\'f4;Hmmmm :-\ :o
Better to show important things too:
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Courier New;}{\f1\fnil\fcharset161{\*\fname Courier New;}Courier New Greek;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\lang1035\f0\fs22 #include <windows.h>\par
#include <stdio.h>\par
\par
static int OrigCodePage;\par
static const char* \f1\'f3\'ea\'e1\'f4;\par
static const char* \'e4\'f5\'f3\'ea\'e1\'f4\'e1\'ed\'ef\'de\'f4\'f9\'ed;\par
Streamed parsing don't work, as have to separate RTF header while processing.
Quote from: TimoVJL on January 22, 2026, 12:42:11 PMBetter to show important things too:
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Courier New;}{\f1\fnil\fcharset161{\*\fname Courier New;}Courier New Greek;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\lang1035\f0\fs22 #include <windows.h>\par
#include <stdio.h>\par
\par
static int OrigCodePage;\par
static const char* \f1\'f3\'ea\'e1\'f4;\par
static const char* \'e4\'f5\'f3\'ea\'e1\'f4\'e1\'ed\'ef\'de\'f4\'f9\'ed;\par
Streamed parsing don't work, as have to separate RTF header while processing.
The streamed parsing is a problem because the AddIn_GetSourceText function extracts UTF-16LE with embedded nulls. The code extracted by AddIn_GetSourceText should be converted to UTF-8, removing the embedded nulls, so that it can be processed with standard, non-wide, C functions.
The RTF encoding of UTF-8 is beyond my understanding, for example, the encoding of UTF-8 eight byte
σκατ;into the expected RTF representation
\'f3\'ea\'e1\'f4;
Those are connected.
{\f1\fnil\fcharset161{\*\fname Courier New;}Courier New Greek;}
\f1\'f3\'ea\'e1\'f4
With UNICODE 16LE a bit less conversion, have to find right fontset for chars.
https://www.oreilly.com/library/view/rtf-pocket-guide/9781449302047/ch04.html (https://www.oreilly.com/library/view/rtf-pocket-guide/9781449302047/ch04.html)
I have low interest for that.
Quote from: TimoVJL on January 22, 2026, 11:03:02 PMThose are connected.
{\f1\fnil\fcharset161{\*\fname Courier New;}Courier New Greek;}
\f1\'f3\'ea\'e1\'f4
With UNICODE 16LE a bit less conversion, have to find right fontset for chars.
https://www.oreilly.com/library/view/rtf-pocket-guide/9781449302047/ch04.html (https://www.oreilly.com/library/view/rtf-pocket-guide/9781449302047/ch04.html)
I have low interest for that.
Ah yes, Code Pages and code page fonts.
Thanks Timo.
Assuming the code comments are in the users default code page language then
#include <windows.h>
#include <stdio.h>
int main() {
UINT user_codepage = GetACP(); // Retrieve the system default Windows ANSI code page
printf("The user's default Windows ANSI code page is: %u\n", user_codepage);
// Optional: Keep the console window open to view the output
printf("Press Enter to exit...");
getchar();
return 0;
}
or variation thereof can get the correct code page to encode in the output file(s).
Code snippet provided by Google 'AI? overview - :( however only the first line is relevant :)
John Z
How that helps RTF coding ?
EDIT:
Quote from: Robert on January 23, 2026, 06:03:43 PMUnfortunately, the RTFDEFS.H document referenced is not obviously available.
How to Obtain the WinWord Converter SDK (GC1039) (https://support.microsoft.com/en-us/topic/how-to-obtain-the-winword-converter-sdk-gc1039-9d68ab16-2714-c0ac-436d-0e9239206835)
HTML
https://unicodelookup.com/ (https://unicodelookup.com/)
Quote from: John Z on January 23, 2026, 12:59:59 PMAssuming the code comments are in the users default code page language then
#include <windows.h>
#include <stdio.h>
int main() {
UINT user_codepage = GetACP(); // Retrieve the system default Windows ANSI code page
printf("The user's default Windows ANSI code page is: %u\n", user_codepage);
// Optional: Keep the console window open to view the output
printf("Press Enter to exit...");
getchar();
return 0;
}
or variation thereof can get the correct code page to encode in the output file(s).
Code snippet provided by Google 'AI? overview - :( however only the first line is relevant :)
John Z
Hi John Z:
My inaccurate "Code Pages" statement should have stated
"Ah yes, charsets and charset fonts."
There is some information in
https://www.biblioscape.com/rtf15_spec.htm (https://www.biblioscape.com/rtf15_spec.htm)
where it is written
Quote\fcharsetN Specifies the character set of a font in the font table. Values for N are defined by Windows header files, and in the file RTFDEFS.H accompanying this document.
Unfortunately, the RTFDEFS.H document referenced is not obviously available.
There is a webpage at
https://www.n2pdf.de/fileadmin/user_upload/n2pdf/files/en/help/client_enu/unicode.htm (https://www.n2pdf.de/fileadmin/user_upload/n2pdf/files/en/help/client_enu/unicode.htm)
that has a table of codepage - charset equivalents.
My interest in your resurrection of the Pelles C Export addin is in the "Export to HTML" facility. I think it can handle Unicode identifiers and quotation mark embedded Unicode strings. RTF ?? I really doubt it. PDF ?? Definitely beyond my pay grade.
If you are interested in developing a Unicode capable "Export to HTML" facility, you might find some help studying the BCX translated C codes of the example on the webpage
https://bcxbasiccoders.com/webhelp/html/bcxunicode.htm#widetoansi (https://bcxbasiccoders.com/webhelp/html/bcxunicode.htm#widetoansi)
Quote from: TimoVJL on January 23, 2026, 05:43:50 PMHow that helps RTF coding ?
EDIT:
Quote from: Robert on January 23, 2026, 06:03:43 PMUnfortunately, the RTFDEFS.H document referenced is not obviously available.
How to Obtain the WinWord Converter SDK (GC1039) (https://support.microsoft.com/en-us/topic/how-to-obtain-the-winword-converter-sdk-gc1039-9d68ab16-2714-c0ac-436d-0e9239206835)
Thanks TimoVjl, the rtfdefs.h file is in the download and the charset defines are
// \fcharset, \cchs argument values
// some of these values may also be #defined in windows.h; here's the
// complete list
#define ANSI_CHARSET 0
#define DEFAULT_CHARSET 1
#define SYMBOL_CHARSET 2
#define INVALID_CHARSET 3 // nil value
#define MAC_CHARSET 77
#define SHIFTJIS_CHARSET 128 // CP 932: Japanese
#define HANGEUL_CHARSET 129 // CP 949: Korean
#define JOHAB_CHARSET 130
#define GB2312_CHARSET 134 // CP 936: PRC
#define CHINESEBIG5_CHARSET 136 // CP 950: Taiwan
#define GREEK_CHARSET 161
#define TURKISH_CHARSET 162
#define HEBREW_CHARSET 177
#define ARABIC_CHARSET 178
#define ARABICTRADITIONAL_CHARSET 179
#define ARABICUSER_CHARSET 180
#define HEBREWUSER_CHARSET 181
#define BALTIC_CHARSET 186
#define RUSSIAN_CHARSET 204
#define THAI_CHARSET 222
#define EASTEUROPE_CHARSET 238
#define PC437_CHARSET 254
#define OEM_CHARSET 255
Correlation of Unicode chars to the above RTF charset data may be possible using the International Components for Unicode libraries functions to process locale data contained in the the Unicode Common Locale Data Repository (CLDR).
https://github.com/unicode-org/icu (https://github.com/unicode-org/icu)
https://github.com/unicode-org/cldr (https://github.com/unicode-org/cldr)
There are several RTF-charset to UTF-8 converters but what is needed here, for the Export to RTF addin, is a Unicode to RTF-charset converter.
Obviously, from the above list of charsets, the conversions from Unicode would be limited. For example, C coders working with the Native American Osage language script or the International Phonetic Alphabet (Hello, anyone out there ?) would be excluded.
Hi Robert,
Quote from: Robert on January 23, 2026, 06:03:43 PMMy interest in your resurrection of the Pelles C Export addin is in the "Export to HTML" facility. I think it can handle Unicode identifiers and quotation mark embedded Unicode strings.
Yes - Export to HTML can easily handle UTF-8, I just did a quick update, but it should not take too much to fix it.
Quote from: Robert on January 23, 2026, 06:03:43 PMIf you are interested in developing a Unicode capable "Export to HTML" facility, you might find some help studying the BCX translated C codes of the example on the webpage
No need - I have already previously written as part of another Add-In program (LineCounter+ https://forum.pellesc.de/index.php?topic=10092.0 ) a module that does output a file in UTF-8 HTML. Since Pelle C now defaults to UTF-8 rather than a codepage it should be even easier.
So - I'll take a look at fixing that part of the Export Add-In. Attached is an HTML output example from the LineCounter Add-in.
John Z
Like you said previously, optimistically, "Nothing is impossible." :)
Making progress :)
Some help used from 'vibe' coding too.
Here is the first output trial utf8 source to utf8 html.
More improvements to be done before posting new project files.
So this is just preliminary look.
John Z
A good thing, that Add-In is still updated :)
Happy to do it.
Here is another output with actual UTF-8 characters :) I realized the first didn't have any 'special' characters.
The version is almost complete. It also has an optional Dark output mode, and an optional Line Number output mode. Just removing any nonsense I might have put in.
John Z
Attached is what I'm calling version 1.3.
It adds the ability to export an UTF encoded source file into an UTF encoded html file.
It supports color coding. If unwanted, change the colors in the source to black.
It supports output with line numbers added, if wanted.
It supports Dark background output, if wanted. (don't use black characters then ;) )
It will partially work for UTF16le but I didn't try any characters unique to UTF16le (maybe later) as UTF-8 was the focus.
All sources included in the project zip as usual.
A Dark Mode with Line Numbers example is attached too.
Check the readme file for more information.
Done...
John Z
Quote from: John Z on January 31, 2026, 05:21:03 PMAttached is what I'm calling version 1.3.
It adds the ability to export an UTF encoded source file into an UTF encoded html file.
It supports color coding. If unwanted, change the colors in the source to black.
It supports output with line numbers added, if wanted.
It supports Dark background output, if wanted. (don't use black characters then ;) )
It will partially work for UTF16le but I didn't try any characters unique to UTF16le (maybe later) as UTF-8 was the focus.
All sources included in the project zip as usual.
A Dark Mode with Line Numbers example is attached too.
Check the readme file for more information.
Done...
John Z
Thank you John Z. An
Excellent job done ! ;D 8)