Page 1 of 1

bug (or feature?) of "identify" of the scanned PDF

Posted: 2019-05-28T03:11:48-07:00
by AlexRozen
ImageMagick 7.0.8-47 Q16 x64
magick.exe identify -verbose 20190528161806907.pdf

Source Image
output

Problem: I'm sure that sample pdf-file contains scanned A4 page with 400dpi setting.
(I have double checked it on different DPI using the ruler and counting the pixels of one letter width with 3200% magnification)
But I can't see true DPI and image size in pixels anywhere in the "identify" command output.
Only 72dpi and accordin image size is shown.

Re: bug (or feature?) of "identify" of the scanned PDF

Posted: 2019-05-28T05:03:05-07:00
by snibgo
Your PDF contains a single page, which has a single embedded raster image and nothing else. IM will rasterize the page at whatever density (aka "resolution", eg pixels per inch) you want, which is useful for vector images. But this will resample any embedded images, which you don't want.

If you want to simply extract the embedded image, I suggest you use pdfimages instead of IM.

Re: bug (or feature?) of "identify" of the scanned PDF

Posted: 2019-05-28T05:50:07-07:00
by AlexRozen
snibgo wrote: 2019-05-28T05:03:05-07:00 Your PDF contains a single page, which has a single embedded raster image and nothing else.
...
If you want to simply extract the embedded image, I suggest you use pdfimages instead of IM.
I simply wanted to find out true parameters of the embedded image.

Re: bug (or feature?) of "identify" of the scanned PDF

Posted: 2019-05-28T06:46:02-07:00
by snibgo
IM doesn't know the true parameters of the embedded image. It only knows the parameters of the rasterized page.

To find the true parameters of the embedded image, extract it with pdfimages, and use "identify" on that.