How to use "getwline"?

BlueD · October 31, 2010, 08:51:39 AM

I think I should ask before I post a bug report:
Please tell me what I did wrong.


#define __STDC_WANT_LIB_EXT2__  1

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
  FILE *fp;
  wchar_t *line = NULL;
  size_t len = 0;
  ssize_t read;

  setlocale(LC_ALL, "");
  fp = fopen("wt1.txt", "r");
  if (fp == NULL)
    exit(1);

  while ((read = getwline(&line, &len, fp)) != -1) {
    wprintf(L"Retrieved line of length %zu :\n", read);
    wprintf(L"%ls", line);
  }

  free(line);
  fclose(fp);
  return 0;
}

The contents of wt1.txt is:

Code Select

大家好！
这是getwline文本读取试验。

Please save it as unicode.

CommonTater · October 31, 2010, 09:13:50 AM

The problem appears to be that your line and len variables are uninitialized.

try...
wchar_t line[128] = {0};
int len = 128;

Moreover, len has to be initialized to the buffer size on each call, as it can be modified by getwline()

JohnF · October 31, 2010, 09:24:30 AM

http://linux.die.net/man/3/getline

"If *lineptr is NULL, then getline() will allocate a buffer for storing the line, which should be freed by the user program."

I'm assuming that getwline also allocates the buffer.

John

BlueD · October 31, 2010, 09:36:07 AM

Actually, my sample is from http://www.opengroup.org/onlinepubs/9699919799/functions/getline.html.

I just modified it a little to use getwline.

And getline works fine, but getwline doesn't.

JohnF · November 02, 2010, 10:17:55 AM

It looks like a bug to me.

getwline actually will load ANSI text, instead of WIDECHAR.

EDIT: The same problem when using the function getwdelim

John

TimoVJL · November 02, 2010, 03:29:27 PM

This example shows it.

Code Select


Retrieved line of length: 13
 ■r
[FF][FE][72][00][6F][00][77][00][31][00][0D][00][0A]

Retrieved line of length: 12

[00][72][00][6F][00][77][00][32][00][0D][00][0A]

Retrieved line of length: 1

[00]

Code Select

#define __STDC_WANT_LIB_EXT2__ 1

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
  FILE *fp;
  wchar_t *line = NULL;
  size_t len = 0;
  ssize_t read;

  fp = fopen("wt3.txt", "r");
  if (fp == NULL)
    exit(1);

  while ((read = getwline(&line, &len, fp)) != -1) {
    printf("Retrieved line of length: %zu\n", read);
    printf("%ls\n", line);
    for (int i = 0; i < read; i++)
      printf("[%02X]", *&line[i]);
    printf("\n\n");
  }

  if (line) free(line);
  fclose(fp);
  return 0;
}

BlueD · November 03, 2010, 01:19:23 AM

Yes, now I think it is a bug.
And I think the bug's in fgetwc.

TimoVJL · November 03, 2010, 08:25:07 AM

fgetwc

Code Select

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>

int main(int argc, char **argv)
{
  FILE *fp;
  wchar_t wc;
  fp = fopen("wt3.txt", "rb");
  if (fp == NULL)
    return 1;
  while ((wc = fgetwc(fp)) != WEOF)
    printf("[%2X]", wc);
  fclose(fp);
  return 0;
}

PellesC:

Code Select

[FF][FE][72][ 0][6F][ 0][77][ 0][31][ 0][ D][ 0][ A][ 0][72][ 0][6F][ 0][77][ 0][32][ 0][ D][ 0][ A][ 0]

Should be

Code Select

[FEFF][72][6F][77][31][ D][ A][72][6F][77][32][ D][ A]

hellork · November 17, 2010, 03:15:12 AM

According to my man page, "The behavior of fgetwc() depends on the LC_CTYPE category of the cur‐
rent locale."
Try adding

Code Select


+#include <locale.h>
...
+char *loc = setlocale (LC_ALL, "");
...
+printf ("locale is %s\n",loc);

I was interested in this, because I want to get to know wide characters.
After cross-compiling with the above changes, I got this:

Code Select


[not me]$ i686-pc-mingw32-gcc test_fgetwc.c -o t.exe
[not me]$ ./t.exe
locale is English_United States.1252
[FEFF][72][6F][77][31][ D][ A][72][6F][77][32][ D][ A]
[not me]$ hexdump -e '1/2 "[%2X]" ' w*
[FEFF][72][6F][77][31][ D][ A][72][6F][77][32][ D][ A]
[not me]$

But compiled on Linux, it can not parse the file because of the locale:

Code Select

[not me]$ ./t
locale is en_US.utf8
/*NULL, nothing, nada*/
[not me]$

So I thought I would get smart and change the locale to try to read the file.

Code Select

[not me]$ locale -a|grep US
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8
es_US
es_US.iso88591
es_US.utf8
yi_US
yi_US.cp1255
yi_US.utf8

I tried seting the locale manually.

Code Select


-char *loc = setlocale (LC_ALL, "");
+char *loc = setlocale (LC_ALL, "en_US.iso88591");

Recompile (on Linux as Linux)

Code Select


[not me]$ gcc test_fgetwc.c -o t
[not me]$ ./t
locale is en_US.iso88591
[FF][FE][72][ 0][6F][ 0][77][ 0][31][ 0][ D][ 0][ A][ 0][72][ 0][6F][ 0][77][ 0][32][ 0][ D][ 0][ A][ 0]

Finally, it occurred to me that on Linux, wchar_t is 32 bits wide, so I saved it as Western (ISO-8859-15) and Linux could read it.

Code Select

[not me]$ ./t
locale is en_US.iso88591
[72][6F][77][31][ D][ A][72][6F][77][32][ D][ A]

JohnF · November 18, 2010, 02:44:48 PM

Just an update - the following code gives the same result on three compilers here, including PellesC.

Code Select


#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <errno.h>

int main(void)
{
    FILE *stream;
    wint_t wc;

    if (NULL == (stream = fopen("uni.txt", "r")))
    {
        wprintf(L"Unable to open: \"uni.txt\"\n");
        exit(1);
    }

    errno = 0;
    while (WEOF != (wc = fgetwc(stream))){
        wprintf(L"wc = %lc\n", wc);
    }

    if (EILSEQ == errno)
    {
        wprintf(L"An invalid wide character was encountered.\n");
        exit(1);
    }
    fclose(stream);
    return 0;
}

John

hellork · November 23, 2010, 04:45:28 AM

Here is the unicode file (save as UTF-8).

Code Select

大家好！
这是getwline文本读取试验。

To read the unicode file as wide characters on Linux, I had to add three lines:

Code Select

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <errno.h>
#include <locale.h>
#define wint_t int
int main (void){
    FILE *stream;
    wint_t wc;
    char *loc = setlocale (LC_ALL, "");
    if (NULL == (stream = fopen("wt1.txt", "r"))){
        wprintf (L"Unable to open: \"uni.txt\"\n");
        exit (1);
    } errno = 0;
    while (WEOF != (wc = fgetwc(stream))){
        wprintf (L"wc = %lc\n", wc);
    } if (EILSEQ == errno){
        wprintf (L"An invalid wide character was encountered.\n");
        exit (1);
    } fclose (stream);
    return 0;
}

Note that wide characters aren't strictly necessary to read the file. It just helps if you want to access individual characters for some reason, such as counting them, etc. The following will display the UTF-8 file contents without setlocale() and other locale-specific code, but will not give access to individual (multibyte) characters:

Code Select

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#define MAX 17
#define NAME "wt1.txt"
int main (void){
    FILE *f;
    char buf[MAX];
    if ((f = fopen(NAME, "r"))){
        while (fgets(buf,MAX,f)){
            printf ("%s", buf);
        } if (EILSEQ == errno){
            printf ("File read error.\n");
            exit (1);
        } fclose (f);
    } else{
        printf ("Unable to open: \"%s\"\n",NAME);
        exit (1);
    } return 0;
}

News:

How to use "getwline"?

BlueD

CommonTater

JohnF

BlueD

JohnF

TimoVJL

BlueD

TimoVJL

hellork

JohnF

hellork