[Solved] Odd problem when converting pdf to images

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
dreniarb
Posts: 3
Joined: 2018-09-14T06:56:10-07:00
Authentication code: 1152

[Solved] Odd problem when converting pdf to images

Post by dreniarb »

I have a script that takes the first 5 pages of a pdf, extracts them to jpg files, then merges those into a spread that looks like this:

Image

Some of these PDFs are created via Publisher. They are exported to PDF in order to keep the url links in them clickable. If we print to PDF via Foxit, CutePDF, or MS PDF Printer the links are no longer clickable.

However the problem is that when we try to make a spread out of these exported PDFs they look like this:

Image

or this:

Image

If I take that PDF and run it through a PDF printer and convert THAT file it works fine. I'd like to take out the manual part of having to print to a second PDF though.

This is the command I normally use in my script:

Code: Select all

convert -verbose %file%.pdf[0] test-0.jpg
convert -verbose %file%.pdf[1] test-1.jpg
convert -verbose %file%.pdf[2] test-2.jpg
convert -verbose %file%.pdf[3] test-3.jpg
convert -verbose %file%.pdf[4] test-4.jpg
I've tried adding "-type truecolor -type bilevel" to it but it doesn't make a difference. The spread still comes out incorrect. There's something about that exported PDF that imagemagick doesn't like. Sites like https://pdftoimage.com convert the pdf to jpg just fine. So I'm not sure what's going on.

Anyone have a suggestion? I'm using 7.0.8-11-Q16-x64 but have tried older versions with the same results.
Last edited by dreniarb on 2018-09-15T06:20:09-07:00, edited 1 time in total.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Odd problem when converting pdf to images

Post by fmw42 »

Post a link to your original PDF. I am not really sure what is our problem. Should they not show overlapped as in your examples? What is it that is wrong?

By the way, you should be able to do that in one command line as

Code: Select all

convert -verbose %file%.pdf[0-4] test.jpg
Imagemagick will automatically add -0, -1 ... to the file names.

Note that if you are trying to avoid the JPG output step and write all 5 composited PDF pages to a new PDF, that will not keep the PDF as vector. Imagemagick is a raster processor and will not preserve vector data. The result will be a raster image in a vector PDF shell.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Odd problem when converting pdf to images

Post by snibgo »

Perhaps the problem is that image pixels that are initially white become black. Perhaps the reason is that they are not initially white, but transparent. Saving to JPG removes the transparency.

If the JPGs are an intermediate step, I suggest you use a different non-lossy format. If you really must use JPG, then flatten against white first.
snibgo's IM pages: im.snibgo.com
dreniarb
Posts: 3
Joined: 2018-09-14T06:56:10-07:00
Authentication code: 1152

Re: Odd problem when converting pdf to images

Post by dreniarb »

Thanks for the suggestion. JPG was just what I picked years ago when I first wrote this. There wasn't a particular reason. The process is

1. Extract page 0-4 to separate jpg files.
2. Reduce pictures to specific dimensions.
3. Draw border around them.
4. Rotate each one a certain way.
5. Start with image 4 working backward and place each image in a new larger jpg file at specific locations so they stack.

I feel like the separate JPG images (or whatever image format) is necessary but odds are there is a better way to do this. This is just the method I came up with and it's worked well until recently. Even the intermediate files do not look correct so I'm pretty sure the problem is in the first conversion steps. :/

I changed the lines of code from jpg to png like this:

Code: Select all

convert -verbose %file%.pdf[0] test-0.png
convert -verbose %file%.pdf[1] test-1.png
convert -verbose %file%.pdf[2] test-2.png
convert -verbose %file%.pdf[3] test-3.png
convert -verbose %file%.pdf[4] test-4.png
And it gave me this:

Image

So I added this to the code:

Code: Select all

convert -type truecolor -type bilevel -verbose %file%.pdf[0] test-0.png
convert -type truecolor -type bilevel -verbose %file%.pdf[1] test-1.png
convert -type truecolor -type bilevel -verbose %file%.pdf[2] test-2.png
convert -type truecolor -type bilevel -verbose %file%.pdf[3] test-3.png
convert -type truecolor -type bilevel -verbose %file%.pdf[4] test-4.png
And the image looked like this:

Image

This is the end goal:

Image

Is there a different format I should try? Perhaps a different command line option that would help?
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Odd problem when converting pdf to images

Post by snibgo »

You have 5 steps in your process. You show code only for the first step. Why?

If you want assistance, please link to a sample input file, and show all your code. Otherwise we can only guess.
snibgo's IM pages: im.snibgo.com
dreniarb
Posts: 3
Joined: 2018-09-14T06:56:10-07:00
Authentication code: 1152

Re: Odd problem when converting pdf to images

Post by dreniarb »

I only showed the first step because that's where it was initially failing. As far as I could see the problem wasn't in reducing them, nor was the problem in rotating them or layering them together. All of that was working properly. And as far as I could tell the problem wasn't with drawing the border. It seemed to me that the problem was just in extracting the image from the pdf.

When I changed the image format from jpg to png it actually did fix that part of the problem - i just didn't realize it at first because I was only paying attention to the finished spread. That's when I posted again that it "still wasn't working". But once I looked at the individual extracted files I noticed that the extracted PNGs were in fact now formatted correctly (whereas when they were pngs they were not).

So with that first command fixed I had to figure out where the next problem was. And it was as you suggested - a transparency issue. It seems this new version of Publisher creates PDFs with transparent backgrounds to save on space (whereas Foxit, CutePDF, and MS PDF Printer don't use transparency - they turn it all to white). When I would issue the command to draw the border on the images instead of drawing a border it was changing the entire background to the border color - black in this case. I discovered this when I made it draw a red border for testing purposes. The entire background (ie. the transparent areas) were now red.

Doing some Googling I found that the border command will do that on an image with transparency. So to fix this I added an extra step where I fill the transparency with white, THEN draw the black border and it works.

So in a nutshell these are the commands:

Code: Select all

convert test.pdf[0] test-0.png
convert -resize 25% -bordercolor white -border 2 test-0.png test1-0.png <---- this is the added step
convert -bordercolor black -border 2 test1-0.png test2-0.png
The end result has test2-0.png formatted correctly.

I really appreciate your help with this.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: [Solved] Odd problem when converting pdf to images

Post by snibgo »

Good stuff. See the documentation on border at http://www.imagemagick.org/script/comma ... php#border :
... This means that with the default compose method of 'Over' any transparent parts may be replaced by the current -bordercolor setting.
snibgo's IM pages: im.snibgo.com
Post Reply