NO

Author Topic: Beginner with Unicode  (Read 3440 times)

JohnF

  • Guest
Beginner with Unicode
« on: June 15, 2014, 03:57:50 PM »
I've been struggling parsing Unicode text for a day - eventually concluded that using fgetwc and fputwc is a waste of time as they only deal with one byte each call. Instead I used fread and fwrite.

Code: [Select]
wchar_t c;

while ((fread(&c, 2, 1, inf)) != 0)
{
if(c == L'\n'){
fwrite(&c, 2, 1, outf);
break;
}
}
   

Plus one needs to open the stream for binary input and output.
   
Code: [Select]
inf = fopen(argv[1], "rb");
   

John

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2113
Re: Beginner with Unicode
« Reply #1 on: June 15, 2014, 06:31:35 PM »
John,
while I will eventually take a look to C standard to see if the unicode support is in standard or not, the problem is that PellesC miss the ccs for unicode support:
Code: [Select]
fopen(fp, "newfile.txt", "rt+, ccs= encoding ");
See.
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

JohnF

  • Guest
Re: Beginner with Unicode
« Reply #2 on: June 15, 2014, 07:35:05 PM »
Frankie,

I think there is some confusion about what is actually the standard, my draft version says (C11).

================
7.29.3 Wide character input/output functions
7.29.3.1 The fgetwc function

Synopsis
#include <stdio.h>
#include <wchar.h>

wint_t fgetwc(FILE *stream);

Description
If the end-of-file indicator for the input stream pointed to by stream is not set and a next wide character is present, the fgetwc function obtains that wide character as a wchar_t converted to a wint_t and advances the associated file position indicator for the stream (if defined).
================

However, reading various pages on the net seem to suggest that fgetwc reads one byte at a time.

It does not seem sensible to me however, if you want a wide char get a wide char. :)

Again my C11 draft says that fopen is.

=================
7.21.5.3 The fopen function

Synopsis
#include <stdio.h>
FILE *fopen(const char * restrict filename, const char * restrict mode);
=================

John