Text error when converting pdf file (letters become rectangles)

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
AVoeee
Posts: 15
Joined: 2018-01-14T12:17:32-07:00
Authentication code: 1152

Text error when converting pdf file (letters become rectangles)

Post by AVoeee » 2018-02-06T07:41:02-07:00

Hello,
when I convert a PDF file, a specific line becomes this. The rest of the text is unaffected.
This behaviour does not occur with any other PDF file (so far).

The line does not contain special characters or similar. It says "Senior Vice President, Oracle University".
Unfortunately, I can not provide the said file because it's a certificate.

I am using ImageMagick version ImageMagick 7.0.7-22 Q16 x86_64 on Linux.
(For the sake of completeness: I use Ghostscript 9.22.)

The result above was generated with this bash code:

Code: Select all

file="some.pdf"
result="result.jpg"

magick -encoding unicode \
	-density 300 \
        "$file" \
        "$result"
I did also try all the encodings that are listed here, but this had no effect.

Did someone encountered a similar problem before?

Best regards
AVoeee
Last edited by AVoeee on 2018-02-07T01:14:44-07:00, edited 1 time in total.

snibgo
Posts: 10139
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Text error when converting pdf file (letters become rectangles)

Post by snibgo » 2018-02-06T10:58:37-07:00

It is probably in a font which is neither embedded on the PDF nor installed on your computer.
snibgo's IM pages: im.snibgo.com

AVoeee
Posts: 15
Joined: 2018-01-14T12:17:32-07:00
Authentication code: 1152

Re: Text error when converting pdf file (letters become rectangles)

Post by AVoeee » 2018-02-06T14:57:19-07:00

Hello,
thanks for the reply!
snibgo wrote:
2018-02-06T10:58:37-07:00
It is probably in a font which is neither embedded on the PDF nor installed on your computer.
I had thought of something similar, but the file is displayed correctly in a PDF viewer on the same computer.

I ran "pdffonts" on said PDF, with the following result:

Code: Select all

pdffonts certificate.pdf

name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
AURAAI+ArialMT                       CID TrueType      Identity-H       yes yes yes     14  0
XUCJMV+ArialUnicodeMS                CID TrueType      Identity-H       yes yes yes      1  0
I assume that the font used for only 1 object is the one causing the problem.

To be sure I installed the Microsoft fonts. Although "Arial" and "Arial-Unicode-MS" do appear now in the output of "magick -list font", the result is the same.

Code: Select all

magick -list font | grep "Arial"

  Font: Arial
    family: Arial
  Font: Arial-Black-Standard
    family: Arial
  Font: Arial-Fett
    family: Arial
  Font: Arial-Fett-Kursiv
    family: Arial
  Font: Arial-Italic
    family: Arial
  Font: Arial-Unicode-MS
    family: Arial Unicode MS
Did I overseen something?

Best regards
AVoeee

AVoeee
Posts: 15
Joined: 2018-01-14T12:17:32-07:00
Authentication code: 1152

Re: Text error when converting pdf file (letters become rectangles)

Post by AVoeee » 2018-02-09T03:16:16-07:00

Hello,
I found a workaround for the problem described above. I use poppler to convert the PDF to some image format and then process this new image with IM7.

Here is the bash code:

Code: Select all

file="some.pdf"
tmp_type="png"
tmp_file="/tmp/pdftoppm_some"		# note that there is no filename extension
result="result.jpg"

pdftoppm -png -r 300 -singlefile "$file" "$tmp_file"

magick -density 300 \
       	"${tmp_file}.${tmp_type}" \
	-background white \
       	"$result"

rm "${tmp_file}.${tmp_type}"
Not very pretty, but it seems to work.


Unfortunately, I still do not know why ImageMagick can not read the said line.

I also tried to "pull out" the line from the PDF so I can provide it for testing. Unfortunately, the error occurs no longer in the edited file ...

Other PDFs do not seem to have this problem either, just two Oracle certificates and both in the same text passage. And that despite the fact that both have embedded fonts.

Best regards
AVoeee

Post Reply