Annotate with utf-8 problem

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
nitech
Posts: 12
Joined: 2007-11-23T13:57:17-07:00

Annotate with utf-8 problem

Post by nitech »

Hi,

I am trying to annotate an image with utf-8 text (russian) from an external text file. Problem is, I get a leading question mark (?) before the text.

Image

Any idea why this happens? Any fix? My code (vbscript) is as follows (the writeUnicodeADODB writes a text file in UTF-8 format and then returns the path to the file in the following format: @c:\test.txt)

Code: Select all

			strResult = img.Convert( _ 
				"-size"			, "200x200", _
				"-font"			, "Arial-Bold", _	
				"-pointsize"	, "12", _
				"-fill"			, "#B6B6B6", _
				"-annotate"		, "0x0+25+18"	,writeUnicodeADODB(sText, strDefaultPath & "top_" & sFilename), _
				"-trim", _
				strDefaultPath & "template.png", _
				strDefaultPath & "top_" & sFilename)
User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: Annotate with utf-8 problem

Post by anthony »

Could you have some extra character in the UFT file. Some UFT files has a special prefix that may cause this, or prehaps a TAB. Control characters are known not to be handled well by the font drawing library.
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
nitech
Posts: 12
Joined: 2007-11-23T13:57:17-07:00

Re: Annotate with utf-8 problem

Post by nitech »

Hi Anthony, and thanks for your reply.

I thought the same as you, but I can't seem to confirm it. I should of course have provided a link to the input text file that I was using. Here goes:

http://www.avento.as/devold/text_images ... 83.png.txt

By the way, when I create a new UTF-8 text file from notepad, and run it as an input to the annotate command, the same problem occur. Like this example:

http://www.avento.as/devold/text_images/russian.txt

I thought this had something to do with the 8-bit versus 16-bit version of ImageMagick, so I installed the newest 8-bit installer. It did however not seem to have an effect.
el_supremo
Posts: 1015
Joined: 2005-03-21T21:16:57-07:00

Re: Annotate with utf-8 problem

Post by el_supremo »

I did a hex dump of your text file and it starts with the three character sequence ef bb bf.
From the wikipedia entry for UTF-8:
Although not part of the standard, many Windows programs (including Windows Notepad) use the byte sequence EF BB BF at the beginning of a file to indicate that the file is encoded using UTF-8. This is the Byte Order Mark U+FEFF encoded in UTF-8
It would appear that Imagemagick does not recognize, and ignore, this sequence.
If that sequence is removed from the file, IM generates the correct annotation.

Pete
User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: Annotate with utf-8 problem

Post by anthony »

Thanks 'el-supremo. I noticed the sequence but did not get the chance to analize it before you respoded. Seems to be a problem with the freetype library that some control characters and its handling of 'bad UTF charcater sequences' is just not done very well at all.

TAB characters is a case in point, this text just has a simular mis-handled sequence. At least however it did something more constructive (print a question mark). Most UTF code displys either ignore it completely, whcih mean you never know there was a problem with the input.
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
nitech
Posts: 12
Joined: 2007-11-23T13:57:17-07:00

Re: Annotate with utf-8 problem

Post by nitech »

It's impressive to see what you knowledgeable people find out.

I use the ADODB.Stream object to write the file. I guess it won't let me create it without the Byte Order Mark. I also guess this problem must be relevant to most languages that use Unicode encoding.

I know this is not a vbScript support forum, but still, you don't happen to know how I can save the file without the Byte Order Mark? My code as for today is something like:

Code: Select all

Function writeUnicodeADODB(txtInput,filePath)
	Dim objStream
	Set objStream = CreateObject("ADODB.Stream")
	objStream.Position = 0
	objStream.Charset = "UTF-8"
	objStream.WriteText txtInput
	objStream.SaveToFile filePath
	writeUnicodeADODB = "@" & filePath
End Function
Regards,
nitech
nitech
Posts: 12
Joined: 2007-11-23T13:57:17-07:00

Re: Annotate with utf-8 problem

Post by nitech »

I found a way to remove the BOM (or - at least the question mark disappeared when I did so.) Here is the vbscript code:

Code: Select all


' Input is Unicode text and filePath is path to the image file we wish to create. First we create a utf-8 file 
' named the same as the image file and then we use the utf-8 file as an input to when creating the image file.
Function writeUnicodeADODB(txtInput,filePath)
	
	' Create and open stream
		Dim objStream
		Set objStream = CreateObject("ADODB.Stream")
		objStream.Open

	'Reset the position and indicate the charactor encoding
		objStream.Position = 0
		objStream.Charset = "UTF-8"
 
	'Write to the steam
		objStream.WriteText txtInput
 
	'Save the stream to a file
		filePath = filePath & ".txt"
		objStream.SaveToFile filePath, 2 ' overwrite if exists
	
	' Return filepath with an @ so that imagemagick understands that it's a file
		writeUnicodeADODB = "@" & RemoveBOM(filePath)
	
	' Kill stream
		Set objStream = Nothing
		
End Function

' Removes the Byte Order Mark - BOM from a text file with UTF-8 encoding 
' The BOM defines that the file was stored with an UTF-8 encoding.
Public function RemoveBOM(filePath)
	
	' Create a reader and a writer
		Dim writer,reader, fileSize
		Set writer = CreateObject("Adodb.Stream")
		Set reader = CreateObject("Adodb.Stream")
	
	' Load from the text file we just wrote
		reader.Open
		reader.LoadFromFile filePath
	
	' Copy all data from reader to writer, except the BOM
		writer.Mode=3
		writer.Type=1
		writer.Open
		reader.position=5 
		reader.copyto writer,-1 

	' Overwrite file
		writer.SaveToFile filePath,2
	
	' Return file name
		RemoveBOM = filePath

	' Kill objects
		Set writer = Nothing 
		Set reader = Nothing

end function

As you can see, I first create the text file, based on the input, and I also set the character set to UTF-8. Then, before returning the file path to imagemagick, I run the file through RemoveBOM(filePath). the RemoveBOM function reads the text file, sets it's position to 5 and then copies everything from position five to another stream, which I again save by overwriting the text file we just read.

For any other that read this post, you will now see that my previously linked graphics file now display correctly:

Image

The code I would use in vbscript to utilize these functions would be:

Code: Select all

strResult = img.Convert( _ 
	"-size", "200x200", _
	"-font", "Arial-Bold", _	
	"-pointsize", "12", _
	"-fill", "#B6B6B6", _
	"-annotate", "0x0+25+18", writeUnicodeADODB(UCase(sText), strDefaultPath & "top_" & sFilename), _
	"-trim", _
	strDefaultPath & "template.png", _
	strDefaultPath & "top_" & sFilename)
Thanks for your help!

Kind regards,
nitech
Post Reply