NO

Author Topic: How to use "getwline"?  (Read 5796 times)

BlueD

  • Guest
How to use "getwline"?
« on: October 31, 2010, 08:51:39 AM »
I think I should ask before I post a bug report:
Please tell me what I did wrong.

Code: [Select]
#define __STDC_WANT_LIB_EXT2__  1

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
  FILE *fp;
  wchar_t *line = NULL;
  size_t len = 0;
  ssize_t read;

  setlocale(LC_ALL, "");
  fp = fopen("wt1.txt", "r");
  if (fp == NULL)
    exit(1);

  while ((read = getwline(&line, &len, fp)) != -1) {
    wprintf(L"Retrieved line of length %zu :\n", read);
    wprintf(L"%ls", line);
  }

  free(line);
  fclose(fp);
  return 0;
}

The contents of wt1.txt is:
Code: [Select]
大家好!
这是getwline文本读取试验。
Please save it as unicode.




CommonTater

  • Guest
Re: How to use "getwline"?
« Reply #1 on: October 31, 2010, 09:13:50 AM »
The problem appears to be that your line and len variables are uninitialized.

try...
wchar_t line[128] = {0};
int len = 128;

Moreover, len has to be initialized to the buffer size on each call, as it can be modified by getwline()

JohnF

  • Guest
Re: How to use "getwline"?
« Reply #2 on: October 31, 2010, 09:24:30 AM »
http://linux.die.net/man/3/getline

"If *lineptr is NULL, then getline() will allocate a buffer for storing the line, which should be freed by the user program."

I'm assuming that getwline also allocates the buffer.

John

BlueD

  • Guest
Re: How to use "getwline"?
« Reply #3 on: October 31, 2010, 09:36:07 AM »
Actually, my sample is from http://www.opengroup.org/onlinepubs/9699919799/functions/getline.html.

I just modified it a little to use getwline.

And getline works fine, but getwline doesn't.

JohnF

  • Guest
Re: How to use "getwline"?
« Reply #4 on: November 02, 2010, 10:17:55 AM »
It looks like a bug to me.

getwline actually will load ANSI text, instead of WIDECHAR.

EDIT: The same problem when using the function getwdelim

John
« Last Edit: November 02, 2010, 02:40:28 PM by JohnF »

Offline TimoVJL

  • Global Moderator
  • Member
  • *****
  • Posts: 2115
Re: How to use "getwline"?
« Reply #5 on: November 02, 2010, 03:29:27 PM »
This example shows it.
Code: [Select]
Retrieved line of length: 13
 ■r
[FF][FE][72][00][6F][00][77][00][31][00][0D][00][0A]

Retrieved line of length: 12

[00][72][00][6F][00][77][00][32][00][0D][00][0A]

Retrieved line of length: 1

[00]

Code: [Select]
#define __STDC_WANT_LIB_EXT2__ 1

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
  FILE *fp;
  wchar_t *line = NULL;
  size_t len = 0;
  ssize_t read;

  fp = fopen("wt3.txt", "r");
  if (fp == NULL)
    exit(1);

  while ((read = getwline(&line, &len, fp)) != -1) {
    printf("Retrieved line of length: %zu\n", read);
    printf("%ls\n", line);
    for (int i = 0; i < read; i++)
      printf("[%02X]", *&line[i]);
    printf("\n\n");
  }

  if (line) free(line);
  fclose(fp);
  return 0;
}
May the source be with you

BlueD

  • Guest
Re: How to use "getwline"?
« Reply #6 on: November 03, 2010, 01:19:23 AM »
Yes, now I think it is a bug.
And I think the bug's in fgetwc.
« Last Edit: November 03, 2010, 02:15:01 AM by BlueD »

Offline TimoVJL

  • Global Moderator
  • Member
  • *****
  • Posts: 2115
Re: How to use "getwline"?
« Reply #7 on: November 03, 2010, 08:25:07 AM »
fgetwc
Code: [Select]
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>

int main(int argc, char **argv)
{
  FILE *fp;
  wchar_t wc;
  fp = fopen("wt3.txt", "rb");
  if (fp == NULL)
    return 1;
  while ((wc = fgetwc(fp)) != WEOF)
    printf("[%2X]", wc);
  fclose(fp);
  return 0;
}
PellesC:
Code: [Select]
[FF][FE][72][ 0][6F][ 0][77][ 0][31][ 0][ D][ 0][ A][ 0][72][ 0][6F][ 0][77][ 0][32][ 0][ D][ 0][ A][ 0]
Should be
Code: [Select]
[FEFF][72][6F][77][31][ D][ A][72][6F][77][32][ D][ A]

May the source be with you

hellork

  • Guest
Re: How to use "getwline"?
« Reply #8 on: November 17, 2010, 03:15:12 AM »
According to my man page, "The  behavior  of fgetwc() depends on the LC_CTYPE category of the cur‐
       rent locale."
Try adding
Code: [Select]
+#include <locale.h>
...
+char *loc = setlocale (LC_ALL, "");
...
+printf ("locale is %s\n",loc);

I was interested in this, because I want to get to know wide characters.
After cross-compiling with the above changes, I got this:
Code: [Select]
[not me]$ i686-pc-mingw32-gcc test_fgetwc.c -o t.exe
[not me]$ ./t.exe
locale is English_United States.1252
[FEFF][72][6F][77][31][ D][ A][72][6F][77][32][ D][ A]
[not me]$ hexdump -e '1/2 "[%2X]" ' w*
[FEFF][72][6F][77][31][ D][ A][72][6F][77][32][ D][ A]
[not me]$

But compiled on Linux, it can not parse the file because of the locale:
Code: [Select]
[not me]$ ./t
locale is en_US.utf8
/*NULL, nothing, nada*/
[not me]$

So I thought I would get smart and change the locale to try to read the file.

Code: [Select]
[not me]$ locale -a|grep US
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8
es_US
es_US.iso88591
es_US.utf8
yi_US
yi_US.cp1255
yi_US.utf8
I tried seting the locale manually.
Code: [Select]
-char *loc = setlocale (LC_ALL, "");
+char *loc = setlocale (LC_ALL, "en_US.iso88591");
Recompile (on Linux as Linux)
Code: [Select]
[not me]$ gcc test_fgetwc.c -o t
[not me]$ ./t
locale is en_US.iso88591
[FF][FE][72][ 0][6F][ 0][77][ 0][31][ 0][ D][ 0][ A][ 0][72][ 0][6F][ 0][77][ 0][32][ 0][ D][ 0][ A][ 0]
Finally, it occurred to me that on Linux, wchar_t is 32 bits wide, so I saved it as Western (ISO-8859-15) and Linux could read it.
Code: [Select]
[not me]$ ./t
locale is en_US.iso88591
[72][6F][77][31][ D][ A][72][6F][77][32][ D][ A]

JohnF

  • Guest
Re: How to use "getwline"?
« Reply #9 on: November 18, 2010, 02:44:48 PM »
Just an update - the following code gives the same result on three compilers here, including PellesC.

Code: [Select]
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <errno.h>

int main(void)
{
    FILE *stream;
    wint_t wc;

    if (NULL == (stream = fopen("uni.txt", "r")))
    {
        wprintf(L"Unable to open: \"uni.txt\"\n");
        exit(1);
    }

    errno = 0;
    while (WEOF != (wc = fgetwc(stream))){
        wprintf(L"wc = %lc\n", wc);
    }

    if (EILSEQ == errno)
    {
        wprintf(L"An invalid wide character was encountered.\n");
        exit(1);
    }
    fclose(stream);
    return 0;
}

John

hellork

  • Guest
Re: How to use "getwline"?
« Reply #10 on: November 23, 2010, 04:45:28 AM »
Here is the unicode file (save as UTF-8).
Code: [Select]
大家好!
这是getwline文本读取试验。
To read the unicode file as wide characters on Linux, I had to add three lines:
Code: [Select]
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <errno.h>
#include <locale.h>
#define wint_t int
int main (void){
    FILE *stream;
    wint_t wc;
    char *loc = setlocale (LC_ALL, "");
    if (NULL == (stream = fopen("wt1.txt", "r"))){
        wprintf (L"Unable to open: \"uni.txt\"\n");
        exit (1);
    } errno = 0;
    while (WEOF != (wc = fgetwc(stream))){
        wprintf (L"wc = %lc\n", wc);
    } if (EILSEQ == errno){
        wprintf (L"An invalid wide character was encountered.\n");
        exit (1);
    } fclose (stream);
    return 0;
}

Note that wide characters aren't strictly necessary to read the file. It just helps if you want to access individual characters for some reason, such as counting them, etc. The following will display the UTF-8 file contents without setlocale() and other locale-specific code, but will not give access to individual (multibyte) characters:
Code: [Select]
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#define MAX 17
#define NAME "wt1.txt"
int main (void){
    FILE *f;
    char buf[MAX];
    if ((f = fopen(NAME, "r"))){
        while (fgets(buf,MAX,f)){
            printf ("%s", buf);
        } if (EILSEQ == errno){
            printf ("File read error.\n");
            exit (1);
        } fclose (f);
    } else{
        printf ("Unable to open: \"%s\"\n",NAME);
        exit (1);
    } return 0;
}
« Last Edit: November 23, 2010, 04:50:08 AM by hellork »