[Windows] Can not open/read files with non-asci filenames

Post any defects you find in the released or beta versions of the ImageMagick software here. Include the ImageMagick version, OS, and any command-line required to reproduce the problem. Got a patch for a bug? Post it here.
Locked
bananas2
Posts: 14
Joined: 2008-02-12T08:51:47-07:00

[Windows] Can not open/read files with non-asci filenames

Post by bananas2 »

1. Set windows locale to japanese and try to process some images (png for example) with japanese filename
2. imagemagick will report improper header (due to ReadBlob count 0)

This happens because utf8->utf16(widechar) conversion is used, BUT windows does not use utf8 for CLI and imagemagick is not even compiled with unicode(utf16) support (no conversion is needed in this case).

argv are encoded with system default for non-unicode apps. So there are 2 ways:
1. compile with unicode support (see msdn wmain probably, i dont know)
2. use MultiByteToWideChar function instead of ConvertUTF8ToUTF16

Code: Select all

   //tested with 932 codepage
   wchars_num =  MultiByteToWideChar(CP_ACP , 0 , path , -1, NULL , 0 );
   unicode_path=(wchar_t *) AcquireQuantumMemory(wchars_num, sizeof(wchar_t));
   MultiByteToWideChar( CP_ACP , 0 , path , -1, unicode_path , wchars_num );
see openmagickstream and getpathattributes, probably there are some other places where ConvertUTF8ToUTF16 is used.

User avatar
magick
Site Admin
Posts: 11254
Joined: 2003-05-31T11:32:55-07:00

Re: [Windows] Can not open/read files with non-asci filename

Post by magick »

We'll get your patch into ImageMagick 6.7.1-3 Beta by sometime tomorrow. Thanks.

Jason S
Posts: 103
Joined: 2010-12-14T19:42:12-07:00
Authentication code: 8675308

Re: [Windows] Can not open/read files with non-asci filename

Post by Jason S »

This change just seems like a bad idea to me. Having a function like OpenMagickStream() support Unicode filenames is a good thing, and you've broken that.

I would either

1) Implement wmain() instead of main(), and convert the (UTF-16) command-line parameters to UTF-8 using WideCharToMultiByte(CP_UTF8, ...).

or

2) Convert the command-line parameters to UTF-8 using MultiByteToWideChar(CP_ACP, ...) followed by WideCharToMultiByte(CP_UTF8, ...). This is worse than option (1) because you still aren't supporting characters that aren't in the user's current codepage. But it's a step in the right direction, and it can easily be improved later.

User avatar
magick
Site Admin
Posts: 11254
Joined: 2003-05-31T11:32:55-07:00

Re: [Windows] Can not open/read files with non-asci filename

Post by magick »

Thanks. We'll revert the patch and rethink file handling in Windows. We're primarily Linux developers and have less confidence when coding for Windows. Patches from the Windows user community are welcome.

bananas2
Posts: 14
Joined: 2008-02-12T08:51:47-07:00

Re: [Windows] Can not open/read files with non-asci filename

Post by bananas2 »

i agree that wmain is the only right way to go, but it is open source so it was enough for me to support other locales at least with MultiByteToWideChar.

Was it really possible to open files (standalone identify and convert) with unicode filename under windows? Command line uses system default codepage for non unicode apps if main is implemented (utf8 is meaningless here), we launch imagemagick from java and it also fails to open japanese files (probably internally arguments are converted to utf16/system default but due to sub main utf8 is wrong again). So I dont think that this patch has broken anything.
Convert the command-line parameters to UTF-8 using MultiByteToWideChar(CP_ACP, ...) followed by WideCharToMultiByte(CP_UTF8, ...).
UTF-8 probably should only be used only to convert label/meta-data commands

It would be good to know how it really works.

User avatar
magick
Site Admin
Posts: 11254
Joined: 2003-05-31T11:32:55-07:00

Re: [Windows] Can not open/read files with non-asci filename

Post by magick »

Consider coding up a wmain() that converts the argv to UTF8 which we can then pass to ImageMagick. If post it here, we will get the patch into the next release of ImageMagick.

Jason S
Posts: 103
Joined: 2010-12-14T19:42:12-07:00
Authentication code: 8675308

Re: [Windows] Can not open/read files with non-asci filename

Post by Jason S »

bananas2 wrote: Was it really possible to open files (standalone identify and convert) with unicode filename under windows?
No; not by using 'convert' or 'identify', anyway. If you wrote your own program that calls OpenMagickStream() directly, then you could have done it by encoding the filename in UTF-8.

The patch does improve the behavior of 'convert', etc. But I don't know whether it breaks anything else. And it's sort of a step away from the full solution.

It occurred to me that the patch, by converting from "ANSI" to UTF-16 and then calling _wfopen, is probably doing exactly what fopen does. You could probably compile IM with MAGICKCORE_HAVE__WFOPEN undefined, and get the same result.

Somebody went to the trouble of writing the code in the "#if defined(MAGICKCORE_HAVE__WFOPEN)" sections, but then apparently didn't make the necessary changes elsewhere to make it useful. Strange.
UTF-8 probably should only be used only to convert label/meta-data commands
Admittedly, it would be hard to make everything work perfectly (what if a filename needs to be printed to the terminal?). But if you have to support Unicode filenames, storing them internally as UTF-8 make sense in this application, simply because all the other options are worse.
magick wrote:Consider coding up a wmain() that converts the argv to UTF8 which we can then pass to ImageMagick.
Although it was one of the things I suggested, I have growing concerns that this could open up a can of worms, and cause any number of subtle compatibility problems. I don't really know what to recommend. I may try it, but don't expect something that can be immediately released.

bananas2
Posts: 14
Joined: 2008-02-12T08:51:47-07:00

Re: [Windows] Can not open/read files with non-asci filename

Post by bananas2 »

i think this is simplest and less error prone (as long as linux version uses utf8 argv) way:

Code: Select all

//change to wmain(int argc, wchar_t **argv)
//FYI wmain is not supported by mingw
//another way szArglist = CommandLineToArgvW(GetCommandLineW(), &nArgs);
//but it seems to have some issues
int main(int argc,char **argv)
{
  char
    *metadata;

  ExceptionInfo
    *exception;

  ImageInfo
    *image_info;

  MagickBooleanType
    status;

  //convert args UTF16->UTF8 using WideCharToMultiByte

  //pass new args array (char) and let im do ConvertUTF8ToUTF16 as it was before
  MagickCoreGenesis(*argv,MagickTrue);
  exception=AcquireExceptionInfo();
  image_info=AcquireImageInfo();
  metadata=(char *) NULL;
  status=MagickCommandGenesis(image_info,IdentifyImageCommand,argc,argv,
    &metadata,exception);
  if (metadata != (char *) NULL)
    metadata=DestroyString(metadata);
  image_info=DestroyImageInfo(image_info);
  exception=DestroyExceptionInfo(exception);
  MagickCoreTerminus();
  return(status);
}
printing to console is also quite tricky, i've seen at least 3 ways of doing it

bananas2
Posts: 14
Joined: 2008-02-12T08:51:47-07:00

Re: [Windows] Can not open/read files with non-asci filename

Post by bananas2 »

@Jason, Magick, what do you think?

Jason S
Posts: 103
Joined: 2010-12-14T19:42:12-07:00
Authentication code: 8675308

Re: [Windows] Can not open/read files with non-asci filename

Post by Jason S »

bananas2 wrote:@Jason, Magick, what do you think?
What I think is that I'm not qualified to figure out what problems this might cause. I don't know enough about how IM handles character encodings, or about all the different platforms and configurations that need to be reviewed.

But I went ahead and tried it. Before the change, this is what happened:

Code: Select all

C:\prj\ImageMagick-6.7.0\VisualMagick\bin>.\convert.exe testΔ☺.jpg out.png
Magick: unable to open image `test??.jpg': Invalid argument @ error/blob.c/OpenBlob/2588.
Magick: missing an image filename `out.png' @ error/convert.c/ConvertImageCommand/3015.
Then I changed convert.c as follows:

Code: Select all

#define NEWSTUFF

#ifdef NEWSTUFF
int wmain(int argc, wchar_t **argvW)
#else
int main(int argc,char **argv)
#endif
{
  ExceptionInfo
    *exception;

  ImageInfo
    *image_info;

  MagickBooleanType
    status;

#ifdef NEWSTUFF
  char **argv;
  int i, len;

  argv = (char**)AcquireMagickMemory(argc*sizeof(char*));
  for (i=0;i<argc;i++) {
    // Calculate number of bytes needed for this UTF-8 arg.
    len = WideCharToMultiByte(CP_UTF8,0,argvW[i],-1,NULL,0,NULL,NULL);
    // Allocate memory for the UTF-8 arg.
    argv[i] = (char*)AcquireMagickMemory(len*sizeof(char));
    // Convert arg to UTF-8.
    WideCharToMultiByte(CP_UTF8,0,argvW[i],-1,argv[i],len,NULL,NULL);
  }
#endif

  MagickCoreGenesis(*argv,MagickTrue);
  [...]
  MagickCoreTerminus();

#ifdef NEWSTUFF
  for (i=0;i<argc;i++) {
	RelinquishMagickMemory((void*)argv[i]);
  }
#endif

  return(status);
}
And now here's what happens (on my computer):

Code: Select all

C:\prj\ImageMagick-6.7.0\VisualMagick\bin>.\convert.exe testΔ☺.jpg out.png

C:\prj\ImageMagick-6.7.0\VisualMagick\bin>
(It works.)

As expected, it causes cosmetic problems with terminal output. I know how to fix this in general, but I don't know how hard it would be in IM's case.

Code: Select all

C:\prj\ImageMagick-6.7.0\VisualMagick\bin>.\convert.exe notexistΔ☺.jpg out.png
Magick: unable to open image `notexistI"â~º.jpg': No such file or directory @ error/blob.c/OpenBlob/2588.
Magick: missing an image filename `out.png' @ error/convert.c/ConvertImageCommand/3015.

User avatar
magick
Site Admin
Posts: 11254
Joined: 2003-05-31T11:32:55-07:00

Re: [Windows] Can not open/read files with non-asci filename

Post by magick »

We'll get your patch into ImageMagick 6.7.1-6 by sometime tomorrow. Thanks.

Locked