NO

Author Topic: islower/isupper issue  (Read 3026 times)

rdentato

  • Guest
islower/isupper issue
« on: October 02, 2016, 11:09:41 AM »
The functions islower() and isupper() crash if the argument (which is supposed to be an int) is greater than 0xFF.

Code: [Select]
#include <stdio.h>
#include <ctype.h>

int main(int argc, char *argv[])
{
printf("islower(%d) -> %d\n",'A',islower('A'));
printf("islower(%d) -> %d\n",'a',islower('a'));
printf("islower(%d) -> %d\n",0x9268,islower(0x9268));  // CRASH
}

The obvious solution is to call them as:

Code: [Select]
islower(x & 0xFF)

But this might be an issue with existing code or libraries.  The proper way would be to perform a check within the islower() and isupper() function themselves.

I didn't check the other isxxx() functions.

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2096
Re: islower/isupper issue
« Reply #1 on: October 02, 2016, 03:46:48 PM »
Hi,
the isxxxx functions are converted to macros (see ctype.h) as follows:

Code: [Select]
/* macro overrides */
#define isalnum(c)  (__ctypetab[(int)(c)] & (_DIGIT|_LOWER|_UPPER))
#define isalpha(c)  (__ctypetab[(int)(c)] & (_LOWER|_UPPER))
#define iscntrl(c)  (__ctypetab[(int)(c)] & _CNTRL)
#define isdigit(c)  (__ctypetab[(int)(c)] & _DIGIT)
#define isgraph(c)  (__ctypetab[(int)(c)] & (_DIGIT|_LOWER|_UPPER|_PUNCT))
#define islower(c)  (__ctypetab[(int)(c)] & _LOWER)
#define isprint(c)  (__ctypetab[(int)(c)] & (_DIGIT|_LOWER|_UPPER|_PUNCT|_SPACE))
#define ispunct(c)  (__ctypetab[(int)(c)] & _PUNCT)
#define isspace(c)  (__ctypetab[(int)(c)] & (_SPACE|_WHITE))
#define isblank(c)  (__ctypetab[(int)(c)] & (_SPACE|_BLANK))
#define isupper(c)  (__ctypetab[(int)(c)] & _UPPER)
#define isxdigit(c)  (__ctypetab[(int)(c)] & _HEX)
#define tolower(c)  __tolowertab[(int)(c)]
#define toupper(c)  __touppertab[(int)(c)]

The conversion is made using the tables __ctypetab and __touppertab. I suppose that those tables are limited at 255 elements, but no control on the valueof c parameter is done, allowing out of bounds indexing. And this  is obviously a bug.
The macros should be corrected as:

Code: [Select]
#define isalnum(c)  (__ctypetab[(int)((unsigned char)(c))] & (_DIGIT|_LOWER|_UPPER))
...
#define toupper(c)  __touppertab[(int)((unsigned char)(c))]

or

Code: [Select]
#define isalnum(c)  (__ctypetab[(int)(c & 0xff)] & (_DIGIT|_LOWER|_UPPER))
...
#define toupper(c)  __touppertab[(int)(c & 0xff)]
« Last Edit: October 02, 2016, 03:51:31 PM by frankie »
It is better to be hated for what you are than to be loved for what you are not. - Andre Gide

rdentato

  • Guest
Re: islower/isupper issue
« Reply #2 on: October 02, 2016, 04:01:17 PM »
I see.

For now, since I'm working with possibly utf-8 encoded strings and those isxxxx functions might be called with an integer greater than 255 I opted to change the tests in the code from:

Code: [Select]
  if (islower(ch)) ....

to

Code: [Select]
  if ((ch <= 0xFF) && islower(ch)) ....

untill I will have the time to properly handle lower/upper case for Unicode codepoints. Masking with 0xFF could be easier but would lead to erroneus results.

Just good enough for now (aslo, the changes to made were very few).

Thanks.


Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2096
Re: islower/isupper issue
« Reply #3 on: October 02, 2016, 11:40:17 PM »
You should not use islower() with UTF8 encoding.
This function, and all similar functions in the isxxxxx family, are intended for program's current locals that could be represented in an unsigned char.
From ISO/IEC 9899:201x, §7.4 Character handling <ctype.h>, sub 1:
Quote
The header <ctype.h> declares several functions useful for classifying and mapping
characters. In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.
See also POSIX.1-2008 - IEEE Std 1003.1™-2008 , Issue 7 here.
The bug is not in the behavior when passing UTF8, or values > 255, that could be compliant with an undefined behavior, as specified in the standard, but the response to an EOF.
EOF is defined as (-1) and when equated to an int can generate an invalid index in the locales table.
« Last Edit: October 02, 2016, 11:46:12 PM by frankie »
It is better to be hated for what you are than to be loved for what you are not. - Andre Gide

rdentato

  • Guest
Re: islower/isupper issue
« Reply #4 on: October 03, 2016, 09:44:11 PM »
Thanks Frankie. I had completely missed it.