“JPEG” algo treats JPEGs differently when combining to a PDF

Post any defects you find in the released or beta versions of the ImageMagick software here. Include the ImageMagick version, OS, and any command-line required to reproduce the problem. Got a patch for a bug? Post it here.
Post Reply
dtr
Posts: 5
Joined: 2018-06-21T07:15:08-07:00
Authentication code: 1152

“JPEG” algo treats JPEGs differently when combining to a PDF

Post by dtr »

I wanted to find the best way to make a PDF from a bunch of JPEGs, so that the quality was preserved at maximum.

The JPEGs are scans of a book, JPEGs are only 1200 px height and have quality 60, so ideally I’d want them to be taken to PDF untouched. Since there may be some metadata manipulation in the process, I can’t compare the original image to the extracted from the PDF with md5sum. But what I need is that they would be visually identical, so I decided to make a SSIM comparison.

Code: Select all

 $ convert 00000001.jpg  -compress JPEG  00000001.pdf
 $ pdfimages  -f 1  -l 1  -j  00000001.pdf  testpage
 $ ssim.sh 00000001.jpg testpage-000.jpg 
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpM1.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpM2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpI2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpM2.mpc]: 2 of 3, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpC12.mpc]: 4 of 5, 100% complete
Fx/Image//tmp/SSIM.17975[tmpM1.mpc]: 1199 of 1200, 100% complete
ssim=0.998 dssim=0.002
SSIM not equal to 1 means that they aren’t visually identical.

I take another image – and SSIM returns 1. For every image except for this one, what’s added to PDF is visually identical to the original. Then I remember, that 00000001.jpg is edited with GIMP – it’s a cover with fabric and it was originally scanned too dark, – so I edited the levels in GIMP. For an experiment I got the source image as it was in the library, named it 00000001.orig.jpg and reran the test.

Code: Select all

$ convert 00000001.orig.jpg -compress JPEG "00000001.orig.pdf"
$ pdfimages  -f 1  -l 1  -j  00000001.orig.pdf  testpage.orig
$ ssim.sh 00000001.orig.jpg testpage.orig-000.jpg 
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpM1.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpM2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpI2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpM2.mpc]: 2 of 3, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpC12.mpc]: 4 of 5, 100% complete
Fx/Image//tmp/SSIM.21724[tmpM1.mpc]: 1199 of 1200, 100% complete
ssim=1 dssim=0
Oh wonder! SSIM returned 1! This means, that a JPEG with levels adjusted in GIMP cannot be put as is into a PDF by the “convert” utility. OK, let’s not use GIMP. But what if there will happen to be source images, that aren’t compatible with convert’s “JPEG” algorithm? What limitations does it place? How do I make sure, that it adds images, and there is no visual quality loss?

00000001.jpg
00000001.orig.jpg

OS: Gentoo, x86_64

Code: Select all

$ convert -version | head -n 1
Version: ImageMagick 7.0.7-35 Q16 x86_64 2018-06-04 https://www.imagemagick.org

Code: Select all

$ pdfimages --version |& head -n1
pdfimages version 0.65.0
ssim.sh can be found at Fred Weinhaus website.

***

I know, that the “Zip” compression algorithm is recommended for combining images to PDF, but it increases size 5–7 times, and the jpegs, that the book is comprised of, already take 100 MiB.
Last edited by dtr on 2018-06-21T12:20:03-07:00, edited 1 time in total.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Post by fmw42 »

After reading all that, I am not sure what the ImageMagick bug actually is. Can you describe it in a short sentence?
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Post by snibgo »

I think the question is: "I have JPEG files. I want to put them inside PDF wrappers, guaranteeing the JPEG doesn't change. Is that possible with ImageMagick?"

I think the answer is: "No."
snibgo's IM pages: im.snibgo.com
dtr
Posts: 5
Joined: 2018-06-21T07:15:08-07:00
Authentication code: 1152

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Post by dtr »

fmw42 wrote: 2018-06-21T08:45:55-07:00 After reading all that, I am not sure what the ImageMagick bug actually is. Can you describe it in a short sentence?
https://i.imgur.com/YOPXqFy.png

When a tool doesn’t produce expected result, it is either a bug, or it lacks documentation.

I’ve taken five pages from that book, with pictures and with text, b/w and coloured, filled and empty, and tested all compression algorithms, that convert offers. SSIM ≠ 1 only for this exact image (00000001.jpg) and only for this exact compression algorithm (JPEG). The image is fine – it’s not broken or anything, it opens well. So why the SSIM result shows that this image alone was altered?

Compression algos comparison: https://pastebin.com/raw/afKEt7SL
The script, that did the comparison: https://github.com/deterenkelt/dotfiles ... testing.sh
Links to images:
00000001.jpg
00000005.jpg
00000010.jpg
00000026.jpg
00000027.jpg
dtr
Posts: 5
Joined: 2018-06-21T07:15:08-07:00
Authentication code: 1152

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Post by dtr »

snibgo wrote: 2018-06-21T09:21:36-07:00 I think the question is: "I have JPEG files. I want to put them inside PDF wrappers, guaranteeing the JPEG doesn't change. Is that possible with ImageMagick?"
I’m sorry, but that’s wrong.
dtr wrote: 2018-06-21T08:15:11-07:00 I wanted to find the best way to make a PDF from a bunch of JPEGs, so that the quality was preserved at maximum.
It would be good enough, if there was some stable mechanism of combining jpegs, that would guarantee visual identity. I don’t care if metadata will change, or there will be pixel manipulation, that’s beyond perception of the SSIM algo (if it returns ssim=1 for the extracted image).

What I’d like to know, is – what in the structure of a JPEG file or in the way it was compressed originally affects SSIM result in this case?

If you could provide a link to some article where it would be explained, why simply “integrating” JPEG into PDF isn’t possible, this would also be helpful, as that will lift off from me the burden of finding the way to integrate them as is.

<unrelated>However, if images are some kind of attachments and PDF supports the JPEG format as well, I wonder why it isn’t possible to transfer pixel information from a JPEG file as is?</unrelated>
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Post by snibgo »

Sorry if I misunderstood you. But surely any change reduces quality (by definition)? The only way of "maintaining maximum quality" is by not changing at all.

As far as I know, simply “integrating” JPEG into PDF is possible, but IM doesn't do that. It reads an image, which might be JPEG or some other format. Then it might do some image processing. Then it writes to PDF, possibly with JPEG compression. It doesn't take a shortcut from input to output, but always decompresses then re-compresses, and this is often lossy: it changes pixels.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Post by fmw42 »

I would suggest you search Google for jpg to pdf converters. See for example, https://en.softonic.com/downloads/jpg-to-pdf-converter
dtr
Posts: 5
Joined: 2018-06-21T07:15:08-07:00
Authentication code: 1152

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Post by dtr »

snibgo wrote: 2018-06-21T13:27:59-07:00 It doesn't take a shortcut from input to output, but always decompresses then re-compresses, and this is often lossy: it changes pixels.
Then it’s definitely strange, that the original image, that was compressed with quality 60, loses quality less, than an edited image, which was saved with quality 96. One would think, that if pixels are to be lost on the way, then the image, that has less, would suffer more. But it’s the other way round.
dtr
Posts: 5
Joined: 2018-06-21T07:15:08-07:00
Authentication code: 1152

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Post by dtr »

fmw42 wrote: 2018-06-21T14:39:22-07:00 I would suggest you search Google for jpg to pdf converters. See for example, https://en.softonic.com/downloads/jpg-to-pdf-converter
Sending people to hell openly is better, for it saves time.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Post by fmw42 »

Are you just angry that you cannot get a good solution from ImageMagick? We are trying to help, but you do not have to be rude.

Snibgo explained it pretty clearly above:
As far as I know, simply “integrating” JPEG into PDF is possible, but IM doesn't do that. It reads an image, which might be JPEG or some other format. Then it might do some image processing. Then it writes to PDF, possibly with JPEG compression. It doesn't take a shortcut from input to output, but always decompresses then re-compresses, and this is often lossy: it changes pixels.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Post by fmw42 »

Although I wrote the ssim shell script, I would suggest you replace that with the IM compare, which includes SSIM. But I think you might be better comparing using for example a method such as rmse that does not do more processing to the image. See http://www.imagemagick.org/script/comma ... php#metric
Post Reply