Codepage for JSON output

Questions and postings pertaining to the development of ImageMagick, feature enhancements, and ImageMagick internals. ImageMagick source code and algorithms are discussed here. Usage questions which are too arcane for the normal user list should also be posted here.
Post Reply
AlexRozen
Posts: 10
Joined: 2018-06-04T08:48:05-07:00
Authentication code: 1152

Codepage for JSON output

Post by AlexRozen »

Let's imagine that we have someimage.jpg with embedded comment, containing some non-latin characters.

I am trying to use following command:
magick.exe convert someimage.jpg someimage.json

resulting json does contain the comment, but it is written in the current windows ANSI codepage (1251 in my location).
I am sure that correct encoding for JSON output should be something more universal, like UTF-8.

Can that be configured via the CLI, or it's a bug?
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Codepage for JSON output

Post by fmw42 »

I cannot answer the question about json output. But in Imagemagick 7, one uses magick, not convert and not magick convert.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Codepage for JSON output

Post by snibgo »

@AlexRozen: please link to a sample image file that contains a comment with non-Latin characters.

Please also say what version of IM you use, on what platform (I guess Windows).
snibgo's IM pages: im.snibgo.com
AlexRozen
Posts: 10
Joined: 2018-06-04T08:48:05-07:00
Authentication code: 1152

Re: Codepage for JSON output

Post by AlexRozen »

I have checked it with ImageMagick-7.0.8-5-portable-Q16-x64
Sample image is here https://drive.google.com/file/d/1R-bRWZ ... sp=sharing

use command
magick.exe convert IMG_3010.JPG IMG_3010.JSON

Resulting json-file contains "comment": "Надежда"

It's pure Cyrillic, so it can be saved into cp1251 correctly. But I can't be sure about it on other platforms and/or distributions.

P.S. I have checked the binary of this jpeg file and another DICOM image file. Both are containing cyrillic strings in binary cp1251 form inside of them.
So, it seems that they are originally stored without unicode and ImageMagick have no chances to determine their true codepage :(
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Codepage for JSON output

Post by snibgo »

AlexRozen wrote:So, it seems that they are originally stored without unicode and ImageMagick have no chances to determine their true codepage
Yes, as you say, the text is encoded as CP 1251, not UTF. IM can't guess which codepage is needed.
snibgo's IM pages: im.snibgo.com
Post Reply