PAID: Extract images from PDF files and convert to image files

Do you need consulting from ImageMagick experts and are willing to pay for their expertise? Or are you well versed in ImageMagick and offer paid consulting? If so, post here otherwise post elsewhere for free assistance.
Post Reply
jjsararas
Posts: 5
Joined: 2016-04-05T13:36:29-07:00
Authentication code: 1151

PAID: Extract images from PDF files and convert to image files

Post by jjsararas »

I have many PDF files which contain detailed schematic images. I need these images extracted into individual image files in order to use in galleries on a (WordPress) website. I need to maintain high resolution and clarity (for zooming in) while keeping filesize as low as possible. I'm presuming PNG8 Grayscale is the desired format but I am not a graphics expert and don't know the best way to proceed.
Windows 7-64
ImageMagick-6.9.3-7-Q16-x64
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: PAID: Extract images from PDF files and convert to image files

Post by fmw42 »

see "pdfimage" from the "xpdf" package, which should directly extract images embedded in a PDF cover
jjsararas
Posts: 5
Joined: 2016-04-05T13:36:29-07:00
Authentication code: 1151

Re: PAID: Extract images from PDF files and convert to image files

Post by jjsararas »

Thanks Fred, I ran the utility but only the company logo was extracted from the PDF, the diagrams were ignored. Could that mean the diagrams themselves were originally created directly as PDFs from some engineering software? Would you have any advice if that's the case?
Windows 7-64
ImageMagick-6.9.3-7-Q16-x64
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: PAID: Extract images from PDF files and convert to image files

Post by snibgo »

pdfimages will extract only raster (pixel) images. It won't extract vector images. ImageMagick will do that, using Ghostscript as a delegate.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: PAID: Extract images from PDF files and convert to image files

Post by fmw42 »

jjsararas wrote:Thanks Fred, I ran the utility but only the company logo was extracted from the PDF, the diagrams were ignored. Could that mean the diagrams themselves were originally created directly as PDFs from some engineering software? Would you have any advice if that's the case?
Yes, it means only the logo was an image imbedded in the PDF. The rest of the pdf is vector data. IM can convert the PDF to some image format such as PNG using the delegate library Ghostscript as user snibgo said above.

But you will likely need to tell the command the desired density that will convert the image to pixels and also know if the PDF is CMYK or sRGB. You may also want to supersample to get higher quality if you have line drawings.

Do you know if all your PDF files are colorspace CMYK or sRGB or a mix of the two. If the latter, then one needs a bit more complicated code to check the image to see which colorspace it is and if it has a profile.

The best thing would be post an example PDF to some free hosting such as dropbox.com or your company server so we can download it and see about these issue. The we can give you some suggested command lines.
jjsararas
Posts: 5
Joined: 2016-04-05T13:36:29-07:00
Authentication code: 1151

Re: PAID: Extract images from PDF files and convert to image files

Post by jjsararas »

Deeply appreciated! PDF file link: https://app.box.com/s/3yow4ddjvfe9hmslfc6w5s7dhue1o5uw

"Do you know if all your PDF files are colorspace CMYK or sRGB or a mix of the two."

I have to assume the latter, as I will be working with multiple sources.
Windows 7-64
ImageMagick-6.9.3-7-Q16-x64
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: PAID: Extract images from PDF files and convert to image files

Post by fmw42 »

The following is Unix syntax. You don't say what your IM version and platform are? Please always provide that information.

Adjust the density for the desired output filesize vs pixel size for readability.

Code: Select all

infile="MAR2014_parts_English.pdf"
inname=`convert -ping "$infile[0]" -format "%t" info:`
cspace=`convert "$infile[0]" -verbose info: | sed -n 's/^[ ]*Colorspace:[ ]*\(.*\)$/\1/p'`
if [ "$cspace" = "CMYK" ]; then
convert -density 300 "$infile" \
-profile /Users/fred/images/profiles/USWebCoatedSWOP.icc \
-profile /Users/fred/images/profiles/sRGB.icc \
PNG8:MAR2014_parts_English_%d.png
elif [ "$cspace" = "sRGB" ]; then
convert -density 300 "$infile" \
PNG8:MAR2014_parts_English_%d.png
else
echo "Some Other Colorspace"
fi
jjsararas
Posts: 5
Joined: 2016-04-05T13:36:29-07:00
Authentication code: 1151

Re: PAID: Extract images from PDF files and convert to image files

Post by jjsararas »

Apologies- I'm on Windows 7-64, ImageMagick-6.9.3-7-Q16-x64. Thank you for that code, very much appreciated. I'm afraid this is a bit out of my range as I'm not a graphics professional by a long shot! I do have a programming background so I can follow what your code is doing, but I wouldn't know where to start in Windows. And I'm brand new to ImageMagick. Thanks for your generosity here.
Windows 7-64
ImageMagick-6.9.3-7-Q16-x64
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: PAID: Extract images from PDF files and convert to image files

Post by fmw42 »

Sorry, but I do not know Windows scripting. Perhaps one of the IM Windows users can help. Alternately, see
http://www.imagemagick.org/Usage/windows/
jjsararas
Posts: 5
Joined: 2016-04-05T13:36:29-07:00
Authentication code: 1151

Re: PAID: Extract images from PDF files and convert to image files

Post by jjsararas »

Thanks again, I will look into it and likely get familiar with Cygwin.

As I have posted this in the Consulting section- if anyone is interested in quoting, please do feel free to PM me for details.
Windows 7-64
ImageMagick-6.9.3-7-Q16-x64
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: PAID: Extract images from PDF files and convert to image files

Post by fmw42 »

If you do not want to use Cygwin, I am sure some Windows user could easily convert my code to Windows .bat file
Post Reply