NO

Author Topic: More Multibyte Misfeasance  (Read 1935 times)

Offline Robert

  • Member
  • *
  • Posts: 247
More Multibyte Misfeasance
« on: April 04, 2023, 07:29:08 AM »
Hi Pelle:

Code below from

https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/mbstowcs-mbstowcs-l?view=msvc-170

compiled with 12.0 RC1 x64 command line

Code: [Select]
cc crt_mbstowcs.c -o crt_mbstowcs.exe
output is

Code: [Select]
crt_mbstowcs.c
crt_mbstowcs.c(77): warning #1039: [ISO] No newline at end of file.
crt_mbstowcs.c(39): error #2140: Type error in argument 1 to 'wcstombs'; expected 'char * restrict' but found 'unsigned char *'.
crt_mbstowcs.c(59): error #2140: Type error in argument 2 to 'mbstowcs'; expected 'const char * restrict' but found 'unsigned char *'.
crt_mbstowcs.c(67): error #2140: Type error in argument 2 to 'mbstowcs'; expected 'const char * restrict' but found 'unsigned char *'.

VS 2022, Nuwen and LLVM-MinGW all output expected

Code: [Select]
Locale information set to Japanese_Japan.932
Convert to multibyte string:
   Required Size: 4
   Number of bytes written to multibyte string: 4
   Hex values of the multibyte characters: 0x82 0xa0 0x82 0xa1
   Codepage 932 uses 0x81 to 0x9f as lead bytes.

Convert back to wide-character string:
   Characters converted: 2
   Hex value of first 2 wide characters: 0x3042 0x3043

Code: [Select]
// crt_mbstowcs.c
// compile with: /W3
// illustrates the behavior of the mbstowcs function

#include <stdlib.h>
#include <stdio.h>
#include <locale.h>

int main(void)
{
  size_t size;
  int nChar = 2; // number of characters to convert
  int requiredSize;

  unsigned char* pmbnull = NULL;
  unsigned char* pmbhello = NULL;
  char* localeInfo;

  wchar_t* pwchello = L"\x3042\x3043"; // 2 Hiragana characters
  wchar_t* pwc;

  /* Enable the Japanese locale and codepage */
  localeInfo = setlocale(LC_ALL, "Japanese_Japan.932");
  printf("Locale information set to %s\n", localeInfo);

  printf("Convert to multibyte string:\n");

  requiredSize = wcstombs(NULL, pwchello, 0); // C4996
  // Note: wcstombs is deprecated; consider using wcstombs_s
  printf("   Required Size: %d\n", requiredSize);

  /* Add one to leave room for the null terminator. */
  pmbhello = (unsigned char*)malloc(requiredSize + 1);
  if (!pmbhello)
  {
    printf("Memory allocation failure.\n");
    return 1;
  }
  size = wcstombs(pmbhello, pwchello, requiredSize + 1); // C4996
  // Note: wcstombs is deprecated; consider using wcstombs_s
  if (size == (size_t)(-1))
  {
    printf("Couldn't convert string. Code page 932 may"
      " not be available.\n");
    return 1;
  }
  printf("   Number of bytes written to multibyte string: %u\n",
    (unsigned int)size);
  printf("   Hex values of the");
  printf(" multibyte characters: %#.2x %#.2x %#.2x %#.2x\n",
    pmbhello[0], pmbhello[1], pmbhello[2], pmbhello[3]);
  printf("   Codepage 932 uses 0x81 to 0x9f as lead bytes.\n\n");

  printf("Convert back to wide-character string:\n");

  /* Assume we don't know the length of the multibyte string.
   Get the required size in characters, and allocate enough space. */

  requiredSize = mbstowcs(NULL, pmbhello, 0); // C4996
  /* Add one to leave room for the null terminator */
  pwc = (wchar_t*)malloc((requiredSize + 1) * sizeof(wchar_t));
  if (!pwc)
  {
    printf("Memory allocation failure.\n");
    return 1;
  }
  size = mbstowcs(pwc, pmbhello, requiredSize + 1); // C4996
  if (size == (size_t)(-1))
  {
    printf("Couldn't convert string--invalid multibyte character.\n");
  }
  printf("   Characters converted: %u\n", (unsigned int)size);
  printf("   Hex value of first 2");
  printf(" wide characters: %#.4x %#.4x\n\n", pwc[0], pwc[1]);
  free(pwc);
  free(pmbhello);
}

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
Re: More Multibyte Misfeasance
« Reply #1 on: April 04, 2023, 09:56:52 PM »
Hello Robert,

You need to include the /J compiler option...

( There are three character types: signed char, unsigned char, and (plain) char. It's implementation-defined if (plain) char is signed or unsigned. )
/Pelle

Offline Robert

  • Member
  • *
  • Posts: 247
Re: More Multibyte Misfeasance
« Reply #2 on: April 05, 2023, 07:14:56 AM »
Hello Robert,

You need to include the /J compiler option...

( There are three character types: signed char, unsigned char, and (plain) char. It's implementation-defined if (plain) char is signed or unsigned. )

Hi Pelle:

Thanks, I will fiddle with this and see what happens.