Beginner with Unicode

JohnF · June 15, 2014, 03:57:50 PM

I've been struggling parsing Unicode text for a day - eventually concluded that using fgetwc and fputwc is a waste of time as they only deal with one byte each call. Instead I used fread and fwrite.

Code Select


wchar_t c;

while ((fread(&c, 2, 1, inf)) != 0)
{
	if(c == L'\n'){
		fwrite(&c, 2, 1, outf);
		break;
	}
}

Plus one needs to open the stream for binary input and output.

Code Select


inf = fopen(argv[1], "rb");

John

frankie · June 15, 2014, 06:31:35 PM

John,
while I will eventually take a look to C standard to see if the unicode support is in standard or not, the problem is that PellesC miss the ccs for unicode support:

Code Select

fopen(fp, "newfile.txt", "rt+, ccs= encoding ");

See.

JohnF · June 15, 2014, 07:35:05 PM

Frankie,

I think there is some confusion about what is actually the standard, my draft version says (C11).

================
7.29.3 Wide character input/output functions
7.29.3.1 The fgetwc function

Synopsis
#include <stdio.h>
#include <wchar.h>

wint_t fgetwc(FILE *stream);

Description
If the end-of-file indicator for the input stream pointed to by stream is not set and a next wide character is present, the fgetwc function obtains that wide character as a wchar_t converted to a wint_t and advances the associated file position indicator for the stream (if defined).
================

However, reading various pages on the net seem to suggest that fgetwc reads one byte at a time.

It does not seem sensible to me however, if you want a wide char get a wide char.

Again my C11 draft says that fopen is.

=================
7.21.5.3 The fopen function

Synopsis
#include <stdio.h>
FILE *fopen(const char * restrict filename, const char * restrict mode);
=================

John

News:

Beginner with Unicode

JohnF

frankie

JohnF