News:

Download Pelles C here: http://www.smorgasbordet.com/pellesc/

Main Menu

Beginner with Unicode

Started by JohnF, June 15, 2014, 03:57:50 PM

Previous topic - Next topic

JohnF

I've been struggling parsing Unicode text for a day - eventually concluded that using fgetwc and fputwc is a waste of time as they only deal with one byte each call. Instead I used fread and fwrite.


wchar_t c;

while ((fread(&c, 2, 1, inf)) != 0)
{
if(c == L'\n'){
fwrite(&c, 2, 1, outf);
break;
}
}
   

Plus one needs to open the stream for binary input and output.
   

inf = fopen(argv[1], "rb");
   

John

frankie

John,
while I will eventually take a look to C standard to see if the unicode support is in standard or not, the problem is that PellesC miss the ccs for unicode support:
fopen(fp, "newfile.txt", "rt+, ccs= encoding ");

See.
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

JohnF

Frankie,

I think there is some confusion about what is actually the standard, my draft version says (C11).

================
7.29.3 Wide character input/output functions
7.29.3.1 The fgetwc function

Synopsis
#include <stdio.h>
#include <wchar.h>

wint_t fgetwc(FILE *stream);

Description
If the end-of-file indicator for the input stream pointed to by stream is not set and a next wide character is present, the fgetwc function obtains that wide character as a wchar_t converted to a wint_t and advances the associated file position indicator for the stream (if defined).
================

However, reading various pages on the net seem to suggest that fgetwc reads one byte at a time.

It does not seem sensible to me however, if you want a wide char get a wide char. :)

Again my C11 draft says that fopen is.

=================
7.21.5.3 The fopen function

Synopsis
#include <stdio.h>
FILE *fopen(const char * restrict filename, const char * restrict mode);
=================

John