bug (or feature?) of "identify" of the scanned PDF

Post any defects you find in the released or beta versions of the ImageMagick software here. Include the ImageMagick version, OS, and any command-line required to reproduce the problem. Got a patch for a bug? Post it here.
Post Reply
AlexRozen
Posts: 10
Joined: 2018-06-04T08:48:05-07:00
Authentication code: 1152

bug (or feature?) of "identify" of the scanned PDF

Post by AlexRozen » 2019-05-28T03:11:48-07:00

ImageMagick 7.0.8-47 Q16 x64
magick.exe identify -verbose 20190528161806907.pdf

Source Image
output

Problem: I'm sure that sample pdf-file contains scanned A4 page with 400dpi setting.
(I have double checked it on different DPI using the ruler and counting the pixels of one letter width with 3200% magnification)
But I can't see true DPI and image size in pixels anywhere in the "identify" command output.
Only 72dpi and accordin image size is shown.

snibgo
Posts: 11827
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: bug (or feature?) of "identify" of the scanned PDF

Post by snibgo » 2019-05-28T05:03:05-07:00

Your PDF contains a single page, which has a single embedded raster image and nothing else. IM will rasterize the page at whatever density (aka "resolution", eg pixels per inch) you want, which is useful for vector images. But this will resample any embedded images, which you don't want.

If you want to simply extract the embedded image, I suggest you use pdfimages instead of IM.
snibgo's IM pages: im.snibgo.com

AlexRozen
Posts: 10
Joined: 2018-06-04T08:48:05-07:00
Authentication code: 1152

Re: bug (or feature?) of "identify" of the scanned PDF

Post by AlexRozen » 2019-05-28T05:50:07-07:00

snibgo wrote:
2019-05-28T05:03:05-07:00
Your PDF contains a single page, which has a single embedded raster image and nothing else.
...
If you want to simply extract the embedded image, I suggest you use pdfimages instead of IM.
I simply wanted to find out true parameters of the embedded image.

snibgo
Posts: 11827
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: bug (or feature?) of "identify" of the scanned PDF

Post by snibgo » 2019-05-28T06:46:02-07:00

IM doesn't know the true parameters of the embedded image. It only knows the parameters of the rasterized page.

To find the true parameters of the embedded image, extract it with pdfimages, and use "identify" on that.
snibgo's IM pages: im.snibgo.com

Post Reply