Page 1 of 1

PAID: Extract images from PDF files and convert to image files

Posted: 2016-04-05T13:45:36-07:00
by jjsararas
I have many PDF files which contain detailed schematic images. I need these images extracted into individual image files in order to use in galleries on a (WordPress) website. I need to maintain high resolution and clarity (for zooming in) while keeping filesize as low as possible. I'm presuming PNG8 Grayscale is the desired format but I am not a graphics expert and don't know the best way to proceed.

Re: PAID: Extract images from PDF files and convert to image files

Posted: 2016-04-05T16:35:26-07:00
by fmw42
see "pdfimage" from the "xpdf" package, which should directly extract images embedded in a PDF cover

Re: PAID: Extract images from PDF files and convert to image files

Posted: 2016-04-06T15:42:20-07:00
by jjsararas
Thanks Fred, I ran the utility but only the company logo was extracted from the PDF, the diagrams were ignored. Could that mean the diagrams themselves were originally created directly as PDFs from some engineering software? Would you have any advice if that's the case?

Re: PAID: Extract images from PDF files and convert to image files

Posted: 2016-04-06T16:31:44-07:00
by snibgo
pdfimages will extract only raster (pixel) images. It won't extract vector images. ImageMagick will do that, using Ghostscript as a delegate.

Re: PAID: Extract images from PDF files and convert to image files

Posted: 2016-04-06T18:26:24-07:00
by fmw42
jjsararas wrote:Thanks Fred, I ran the utility but only the company logo was extracted from the PDF, the diagrams were ignored. Could that mean the diagrams themselves were originally created directly as PDFs from some engineering software? Would you have any advice if that's the case?
Yes, it means only the logo was an image imbedded in the PDF. The rest of the pdf is vector data. IM can convert the PDF to some image format such as PNG using the delegate library Ghostscript as user snibgo said above.

But you will likely need to tell the command the desired density that will convert the image to pixels and also know if the PDF is CMYK or sRGB. You may also want to supersample to get higher quality if you have line drawings.

Do you know if all your PDF files are colorspace CMYK or sRGB or a mix of the two. If the latter, then one needs a bit more complicated code to check the image to see which colorspace it is and if it has a profile.

The best thing would be post an example PDF to some free hosting such as dropbox.com or your company server so we can download it and see about these issue. The we can give you some suggested command lines.

Re: PAID: Extract images from PDF files and convert to image files

Posted: 2016-04-07T12:36:53-07:00
by jjsararas
Deeply appreciated! PDF file link: https://app.box.com/s/3yow4ddjvfe9hmslfc6w5s7dhue1o5uw

"Do you know if all your PDF files are colorspace CMYK or sRGB or a mix of the two."

I have to assume the latter, as I will be working with multiple sources.

Re: PAID: Extract images from PDF files and convert to image files

Posted: 2016-04-07T16:25:51-07:00
by fmw42
The following is Unix syntax. You don't say what your IM version and platform are? Please always provide that information.

Adjust the density for the desired output filesize vs pixel size for readability.

Code: Select all

infile="MAR2014_parts_English.pdf"
inname=`convert -ping "$infile[0]" -format "%t" info:`
cspace=`convert "$infile[0]" -verbose info: | sed -n 's/^[ ]*Colorspace:[ ]*\(.*\)$/\1/p'`
if [ "$cspace" = "CMYK" ]; then
convert -density 300 "$infile" \
-profile /Users/fred/images/profiles/USWebCoatedSWOP.icc \
-profile /Users/fred/images/profiles/sRGB.icc \
PNG8:MAR2014_parts_English_%d.png
elif [ "$cspace" = "sRGB" ]; then
convert -density 300 "$infile" \
PNG8:MAR2014_parts_English_%d.png
else
echo "Some Other Colorspace"
fi

Re: PAID: Extract images from PDF files and convert to image files

Posted: 2016-04-08T09:38:01-07:00
by jjsararas
Apologies- I'm on Windows 7-64, ImageMagick-6.9.3-7-Q16-x64. Thank you for that code, very much appreciated. I'm afraid this is a bit out of my range as I'm not a graphics professional by a long shot! I do have a programming background so I can follow what your code is doing, but I wouldn't know where to start in Windows. And I'm brand new to ImageMagick. Thanks for your generosity here.

Re: PAID: Extract images from PDF files and convert to image files

Posted: 2016-04-08T09:56:33-07:00
by fmw42
Sorry, but I do not know Windows scripting. Perhaps one of the IM Windows users can help. Alternately, see
http://www.imagemagick.org/Usage/windows/

Re: PAID: Extract images from PDF files and convert to image files

Posted: 2016-04-08T13:05:43-07:00
by jjsararas
Thanks again, I will look into it and likely get familiar with Cygwin.

As I have posted this in the Consulting section- if anyone is interested in quoting, please do feel free to PM me for details.

Re: PAID: Extract images from PDF files and convert to image files

Posted: 2016-04-08T14:03:06-07:00
by fmw42
If you do not want to use Cygwin, I am sure some Windows user could easily convert my code to Windows .bat file