News:

Download Pelles C here: http://www.pellesc.se

Main Menu

Export C source as HTML or PDF file

Started by Pelle, March 16, 2005, 07:34:45 PM

Previous topic - Next topic

Robert

Quote from: TimoVJL on January 17, 2026, 01:21:16 PMA small stupid C to RTF project to RichEdit.

It can help to debug some code.

Hi Timo:
The RTF output from this code is different from the RTF output of John Z's IDE addin.
I'm going to have a look at the output in an ImHex editor and see what's going on.

You know ImHex ?
https://imhex.werwolv.net/
https://github.com/WerWolv/ImHex

TimoVJL

#31
Quote from: Robert on January 18, 2026, 12:17:30 AMThe RTF output from this code is different from the RTF output of John Z's IDE addin.
It might just created for different purbose, like writing to RichEdit control.
Also it was for only ANSI source.

int printf(const char * restrict format, ...);

// https://stackoverflow.com/questions/5603559/one-file-lib-to-conv-utf8-char-to-wchar-t
short utf8_to_wchar(char **utf8)
{
    short sz = 0;
    short c;
    char *p = *(char **)utf8;
    char v = (*p);
    if (v >= 0)
    {
        c = v;
        sz += c;
        ++p; (*utf8)++;
    }
    int shiftCount = 0;
    if ((v & 0xE0) == 0xC0)
    {
        shiftCount = 1;
        c = v & 0x1F;
    }
    else if ((v & 0xF0) == 0xE0)
    {
        shiftCount = 2;
        c = v & 0xF;
    }
    else
        return 0;
    ++p; (*utf8)++;
    while (shiftCount)
    {
        v = *p;
        ++p; (*utf8)++;
        if ((v & 0xC0) != 0x80)
            return 0;
        c <<= 6;
        c |= (v & 0x3F);
        --shiftCount;
    }
    sz += c;
    return sz;
}

int ShortToStrPos(int n, char *s)
{
    int i, sign, idx, nl, len;

    idx = 0;
/*    if ((sign = n) < 0) {    // record sign
        n = -n;    // make n positive
        idx++;
    }*/
    i = 0;
    nl = n;
    while ((nl /= 10) > 0)    /* count nums */
        idx++;
    len = idx+1;
    s[idx+1] = '\0';
    do {    /* generate digits in reverse order */
        s[idx--] = n % 10 + '0';    /* get next digit */
    } while ((n /= 10) > 0);    /* delete it */
//    if (sign < 0)
//        s[0] = '-';
    return len;
}

int __cdecl main(void)
{
    char utf8[] = u8"σκατ";
    char *p = utf8;
    while (*p) {
        if (*(unsigned char*)p > 127) {    // UTF8 ?
            short uc = utf8_to_wchar(&p);
            printf("%Xh\t", uc);
        }
    }
    printf("\n%p\n%p\n", utf8, p);
    return 0;
}

EDIT 2025-01-19: UNICODE version in RE_Test3 and esc close window, but still bugs
May the source be with you

Vortex

Hi Timo,

My apologies, it was my mistake. Your application works fine and I removed my previous message #41859
Code it... That's all...

Robert

Quote from: TimoVJL on January 18, 2026, 05:00:36 AM
Quote from: Robert on January 18, 2026, 12:17:30 AMThe RTF output from this code is different from the RTF output of John Z's IDE addin.
It might just created for different purbose, like writing to RichEdit control.
Also it was for only ANSI source.

int printf(const char * restrict format, ...);

// https://stackoverflow.com/questions/5603559/one-file-lib-to-conv-utf8-char-to-wchar-t
short utf8_to_wchar(char **utf8)
{
    short sz = 0;
    short c;
    char *p = *(char **)utf8;
    char v = (*p);
    if (v >= 0)
    {
        c = v;
        sz += c;
        ++p; (*utf8)++;
    }
    int shiftCount = 0;
    if ((v & 0xE0) == 0xC0)
    {
        shiftCount = 1;
        c = v & 0x1F;
    }
    else if ((v & 0xF0) == 0xE0)
    {
        shiftCount = 2;
        c = v & 0xF;
    }
    else
        return 0;
    ++p; (*utf8)++;
    while (shiftCount)
    {
        v = *p;
        ++p; (*utf8)++;
        if ((v & 0xC0) != 0x80)
            return 0;
        c <<= 6;
        c |= (v & 0x3F);
        --shiftCount;
    }
    sz += c;
    return sz;
}

int ShortToStrPos(int n, char *s)
{
    int i, sign, idx, nl, len;

    idx = 0;
/*    if ((sign = n) < 0) {    // record sign
        n = -n;    // make n positive
        idx++;
    }*/
    i = 0;
    nl = n;
    while ((nl /= 10) > 0)    /* count nums */
        idx++;
    len = idx+1;
    s[idx+1] = '\0';
    do {    /* generate digits in reverse order */
        s[idx--] = n % 10 + '0';    /* get next digit */
    } while ((n /= 10) > 0);    /* delete it */
//    if (sign < 0)
//        s[0] = '-';
    return len;
}

int __cdecl main(void)
{
    char utf8[] = u8"σκατ";
    char *p = utf8;
    while (*p) {
        if (*(unsigned char*)p > 127) {    // UTF8 ?
            short uc = utf8_to_wchar(&p);
            printf("%Xh\t", uc);
        }
    }
    printf("\n%p\n%p\n", utf8, p);
    return 0;
}

EDIT 2025-01-19: UNICODE version in RE_Test3 and esc close window, but still bugs

Hei TimoVJL:

The code snippet above is interesting. Thanks.

What bugs ? I don't see bugs in RE_Test3 output.




Robert

Quote from: TimoVJL on January 18, 2026, 05:00:36 AM
....

        if (*(unsigned char*)p > 127) {    // UTF8 ?
....



Hei TimoVJL:

"Nearly all invalid UTF-8 cases can be detected by looking at the first two bytes of a character (in fact, the first 12 bits)."

Quoted from:
'Validating UTF-8 In Less Than One Instruction Per Byte'
available at
https://arxiv.org/pdf/2010.03090.pdf

See also:
Ridiculously fast unicode (UTF-8) validation

Thanks again for the code.

Mikään ei ole mahdotonta.



TimoVJL

#35
QuoteRTF SYNTAX

An RTF file consists of unformatted text, control words, control symbols, and groups. For ease of transport, a standard RTF file can consist of only 7-bit ASCII characters. (Converters that communicate with Microsoft Word for Windows or Microsoft Word for the Macintosh should expect 8-bit characters.) There is no set maximum line length for an RTF file.

RTF use ASCII 32 - 127 chars and some latin-1 (ISO/IEC 8859) chars without coding.

So i was just lazy for checking chars like many others.
UTF-8 with BOM can have conditional processing.
May the source be with you

Robert

Quote from: TimoVJL on Yesterday at 07:14:06 AM
QuoteRTF SYNTAX

An RTF file consists of unformatted text, control words, control symbols, and groups. For ease of transport, a standard RTF file can consist of only 7-bit ASCII characters. (Converters that communicate with Microsoft Word for Windows or Microsoft Word for the Macintosh should expect 8-bit characters.) There is no set maximum line length for an RTF file.

RTF use ASCII 32 - 127 chars and some latin-1 (ISO/IEC 8859) chars without coding.

So i was just lazy for checking chars like many others.
UTF-8 with BOM can have conditional processing.


Hi TimoVJL and John Z:

RTF SYNTAX.
Oh that !
Yeah, well, I think I'm begining to remember why I'm here.

Export C source etc.

Anyway, you solved what I had considered the hard part, that is, dealing with the UTF-16LE text which is what the export addin function AddIn_GetSourceTextW has to process.

However,as John Z mentioned and you yelled "RTF SYNTAX" I had to look and see what was expected from

static const char* σκατ;

and saw that it was an RTF encoding of

\par }{\rtlch\fcs1 \af67 \ltrch\fcs0 \f67\insrsid15157589\charrsid15157589 static const char* \'f3\'ea\'e1\'f4;

Hmmmm  :-\  :o


TimoVJL

Better to show important things too:
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Courier New;}{\f1\fnil\fcharset161{\*\fname Courier New;}Courier New Greek;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\lang1035\f0\fs22 #include <windows.h>\par
#include <stdio.h>\par
\par
static int OrigCodePage;\par
static const char* \f1\'f3\'ea\'e1\'f4;\par
static const char* \'e4\'f5\'f3\'ea\'e1\'f4\'e1\'ed\'ef\'de\'f4\'f9\'ed;\par
Streamed parsing don't work, as have to separate RTF header while processing.
May the source be with you