Pelles C forum
C language => Tips & tricks => Topic started by: JohnF on June 15, 2014, 03:57:50 PM
-
I've been struggling parsing Unicode text for a day - eventually concluded that using fgetwc and fputwc is a waste of time as they only deal with one byte each call. Instead I used fread and fwrite.
wchar_t c;
while ((fread(&c, 2, 1, inf)) != 0)
{
if(c == L'\n'){
fwrite(&c, 2, 1, outf);
break;
}
}
Plus one needs to open the stream for binary input and output.
inf = fopen(argv[1], "rb");
John
-
John,
while I will eventually take a look to C standard to see if the unicode support is in standard or not, the problem is that PellesC miss the ccs for unicode support:
fopen(fp, "newfile.txt", "rt+, ccs= encoding ");
See (http://msdn.microsoft.com/en-us/library/yeby3zcb.aspx).
-
Frankie,
I think there is some confusion about what is actually the standard, my draft version says (C11).
================
7.29.3 Wide character input/output functions
7.29.3.1 The fgetwc function
Synopsis
#include <stdio.h>
#include <wchar.h>
wint_t fgetwc(FILE *stream);
Description
If the end-of-file indicator for the input stream pointed to by stream is not set and a next wide character is present, the fgetwc function obtains that wide character as a wchar_t converted to a wint_t and advances the associated file position indicator for the stream (if defined).
================
However, reading various pages on the net seem to suggest that fgetwc reads one byte at a time.
It does not seem sensible to me however, if you want a wide char get a wide char. :)
Again my C11 draft says that fopen is.
=================
7.21.5.3 The fopen function
Synopsis
#include <stdio.h>
FILE *fopen(const char * restrict filename, const char * restrict mode);
=================
John