Page 1 of 1

Problem with only some JPG's created from Image Magick

Posted: 2011-02-23T06:51:05-07:00
by kbirecki
I am creating JPG's as thumbnails from PDF documents. I am then loading those JPG's into an image control in a VB6 application to give the user a preview of the PDF. The problem I am having is that some of the JPG's are not loadable into the VB6 image control, and I narrowed it down to something with the JPG file itself, or possibly the original PDF source file. Either way, I'm looking for any suggestions on anything I can change in my Image Magick processing command that might be able to resolve the problem.

To be clear, in all cases, Image Magick *does* create JPG's and they are viewable by double-clicking them in Windows, it's just that there is some difference in the JPG's created that they can't be displayed in a VB6 Image control, possibly due to differences in the source PDF's.

Environment:
Image Magick Version 6.6.7-5 2011-02-03 Q16
[Edit: GhostScript 9.0 also]
Development environment: VB6, WinXP
Included examples: (All are shared on 4shared.com; click the big blue button under the preview image to download the file.)
1) PDF1-GeneratesBadJPG.pdf - This is an example PDF that experiences the problem. (http://www.4shared.com/document/EcLEPlN ... adJPG.html)
2) JPG1-FromPDF1.JPG - This is the output JPG from my Image Magick command that experiences the problem of not loading into the VB6 image control.
(http://www.4shared.com/photo/HYriJMbH/J ... mPDF1.html)
3) PDF2-GeneratesGoodJPG.pdf - This is an example PDF that *does not* experience the problem. (http://www.4shared.com/document/iTiJDqO ... odJPG.html)
4) JPG2-FromPDF2.JPG - This is the output JPG from my Image Magick command that *does not* experience the problem. (http://www.4shared.com/photo/vwWdmEqB/J ... mPDF2.html)

Image Magick Command:

Code: Select all

Convert.exe C:\Test\PDF1-GeneratesBadJPG.pdf[0-9] -scale 34% C:\Test\thumbnail.jpg
I did determine that when I compare the contents of the successful and unsuccessful JPG output files in a text editor, I noticed that right near the beginning of the file, there is the text "Adobe" in the ones that fail, and it does not exist in the ones that succeed.

NOTE: As a test, I ran the suspect PDF through another PDF print driver to create a new PDF file, then I generated a JPG image from the resulting PDF with Image Magick in the same manner as the others, and this resulting JPG worked fine. It loaded in the VB6 image control and it didn't have the "Adobe" text near the beginning of the file. So one possible resolution is to pre-process all PDF's using this print driver, but I really don't want to have to pre-process all PDF's before running them through IM.

Are there any suggestions to consistently process all PDF's through IM?
Thanks!

Re: Problem with only some JPG's created from Image Magick

Posted: 2011-02-23T18:10:15-07:00
by anthony
You could add a -taint option between the input an output.

This will prevent any posibility of 'short-circuit' delegates generating the JPG directly from the PDF without going through IM itself.

For details, see: IM Examples, Delegates, Direct Delegate Conversion
http://www.imagemagick.org/Usage/files/#delegate_direct

however I did not know IM delegates can extract a JPG image from specific PDF files. It would not suprise me though.

Cristy can tell you how to trace the delegate handling IM is performing for format conversions. Would be a nice thing for me to note on IM examples too :-)

Re: Problem with only some JPG's created from Image Magick

Posted: 2011-02-23T18:26:29-07:00
by magick
Perhaps your application cannot deal with JPEG's in the CMYK colorspace. Add -colorspace rgb before the PDF image filename on the command-line.

Re: Problem with only some JPG's created from Image Magick

Posted: 2011-02-23T19:14:10-07:00
by Drarakel
As magick wrote - it's probably because of the CMYK JPG file (JPGs can be RGB or CMYK).

However, if you have an older Ghostscript, I would first recommend updating Ghostscript to a current version (GS v9.0 or newer). ImageMagick uses Ghostscript to read PDFs, and with a current Ghostscript, it's much easier to get correct colors. (In your 'plain text' example, that's of course not overly important.) Some fiddling could still be involved.. but with older IM/GS combinations, there were a lot more obstacles.

Now you should be able to use that commandline in order to get an ok preview (readable by everything):

Code: Select all

convert -colorspace rgb PDF1-GeneratesBadJPG.pdf[0-9] -bordercolor white -border 0 -alpha off -scale 34% thumbnail.jpg
The '-colorspace rgb' before the input file is needed, so that Ghostscript itself converts the image into sRGB. (And that doesn't work that way in older Ghostscript versions.) And the '-border' option is there to 'remove' transparency in all situations (leaving that away can give you strange results when writing to JPG). I would normally use '-flatten' for that, but the latter option only works with one page, not with a sequence. You could even leave away the options that deal with transparency if you edit ImageMagick's delegates.xml (by changing pngalpha into e.g. pnmraw in the "ps:alpha" line).

A quick solution - that should even work with older Ghostscript versions - would be:

Code: Select all

convert PDF1-GeneratesBadJPG.pdf[0-9] -colorspace RGB -bordercolor white -border 0 -alpha off -scale 34% thumbnail.jpg
Yes, only the order is different. With that command, ImageMagick will 'take care' of a potential CMYK output..
But I would recommend the first solution (with a current Ghostscript).

Re: Problem with only some JPG's created from Image Magick

Posted: 2011-02-24T06:39:59-07:00
by kbirecki
Thanks for the suggestions, but still no joy. I do have the most recent GhostScript - 9.0. I've also tried variations of the setting colorspace to reduce the number of colors in the resulting images. I've tried different formats like BMP. I've also trie the tip of changing pgnalpha to pngraw in the Delegates.xml file.

My research on the VB6 side of the problem seems to indicate (not conclusive) that there are some pictures VB6 cannot display, generally due to things like the number of colors, possibly a hotfix is required. Yet I don't think that can be the case because many conversions do work. It seems to just be some PDF's that cannot be directly converted. When I examined two PDF's, one that works and one that does not, they both show "%PDF 1.4" at the beginning of the file, so they are the same version of PDF format.

The specific error I get in VB6 is "Runtime error 481: Invalid Picture". Because some PDF conversions worked and some did not, I thought it might be how I was converting them with IM, but it may be that the source PDF's are the problem.

I'm still trying to find a solution because I bet I haven't tried all possibilities yet.

Re: Problem with only some JPG's created from Image Magick

Posted: 2011-02-24T07:26:42-07:00
by kbirecki
Well, I have to retract my last post. I did finally get it working with the known problematic PDF. Playing around with more "Convert" command line options got it working again. But then I tried to backtrack to see which specific option made it work and I found that keeping the colorspace option is essential, but I tried it before and it failed (I compared the newest working vrsion to the log of commands executed previously). I had tried things like "-colors 16" and "-colorspace sRGB"/"-colorspace Gray", as well as converting to BMP and PNG formats (VB6 didn't like the PNG formats at all.) Anyway, it's working now. I'd say the colorspace command was the best help, and the bordercolor/border/alpha options are a good suggestion to keep the images most acceptable.

Thank you all, especially Drarakel for your tips.