Page 1 of 1

“JPEG” algo treats JPEGs differently when combining to a PDF

Posted: 2018-06-21T08:15:11-07:00
by dtr
I wanted to find the best way to make a PDF from a bunch of JPEGs, so that the quality was preserved at maximum.

The JPEGs are scans of a book, JPEGs are only 1200 px height and have quality 60, so ideally I’d want them to be taken to PDF untouched. Since there may be some metadata manipulation in the process, I can’t compare the original image to the extracted from the PDF with md5sum. But what I need is that they would be visually identical, so I decided to make a SSIM comparison.

Code: Select all

 $ convert 00000001.jpg  -compress JPEG  00000001.pdf
 $ pdfimages  -f 1  -l 1  -j  00000001.pdf  testpage
 $ ssim.sh 00000001.jpg testpage-000.jpg 
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpM1.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpM2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpI2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpM2.mpc]: 2 of 3, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpC12.mpc]: 4 of 5, 100% complete
Fx/Image//tmp/SSIM.17975[tmpM1.mpc]: 1199 of 1200, 100% complete
ssim=0.998 dssim=0.002
SSIM not equal to 1 means that they aren’t visually identical.

I take another image – and SSIM returns 1. For every image except for this one, what’s added to PDF is visually identical to the original. Then I remember, that 00000001.jpg is edited with GIMP – it’s a cover with fabric and it was originally scanned too dark, – so I edited the levels in GIMP. For an experiment I got the source image as it was in the library, named it 00000001.orig.jpg and reran the test.

Code: Select all

$ convert 00000001.orig.jpg -compress JPEG "00000001.orig.pdf"
$ pdfimages  -f 1  -l 1  -j  00000001.orig.pdf  testpage.orig
$ ssim.sh 00000001.orig.jpg testpage.orig-000.jpg 
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpM1.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpM2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpI2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpM2.mpc]: 2 of 3, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpC12.mpc]: 4 of 5, 100% complete
Fx/Image//tmp/SSIM.21724[tmpM1.mpc]: 1199 of 1200, 100% complete
ssim=1 dssim=0
Oh wonder! SSIM returned 1! This means, that a JPEG with levels adjusted in GIMP cannot be put as is into a PDF by the “convert” utility. OK, let’s not use GIMP. But what if there will happen to be source images, that aren’t compatible with convert’s “JPEG” algorithm? What limitations does it place? How do I make sure, that it adds images, and there is no visual quality loss?

00000001.jpg
00000001.orig.jpg

OS: Gentoo, x86_64

Code: Select all

$ convert -version | head -n 1
Version: ImageMagick 7.0.7-35 Q16 x86_64 2018-06-04 https://www.imagemagick.org

Code: Select all

$ pdfimages --version |& head -n1
pdfimages version 0.65.0
ssim.sh can be found at Fred Weinhaus website.

***

I know, that the “Zip” compression algorithm is recommended for combining images to PDF, but it increases size 5–7 times, and the jpegs, that the book is comprised of, already take 100 MiB.

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Posted: 2018-06-21T08:45:55-07:00
by fmw42
After reading all that, I am not sure what the ImageMagick bug actually is. Can you describe it in a short sentence?

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Posted: 2018-06-21T09:21:36-07:00
by snibgo
I think the question is: "I have JPEG files. I want to put them inside PDF wrappers, guaranteeing the JPEG doesn't change. Is that possible with ImageMagick?"

I think the answer is: "No."

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Posted: 2018-06-21T12:47:40-07:00
by dtr
fmw42 wrote: 2018-06-21T08:45:55-07:00 After reading all that, I am not sure what the ImageMagick bug actually is. Can you describe it in a short sentence?
https://i.imgur.com/YOPXqFy.png

When a tool doesn’t produce expected result, it is either a bug, or it lacks documentation.

I’ve taken five pages from that book, with pictures and with text, b/w and coloured, filled and empty, and tested all compression algorithms, that convert offers. SSIM ≠ 1 only for this exact image (00000001.jpg) and only for this exact compression algorithm (JPEG). The image is fine – it’s not broken or anything, it opens well. So why the SSIM result shows that this image alone was altered?

Compression algos comparison: https://pastebin.com/raw/afKEt7SL
The script, that did the comparison: https://github.com/deterenkelt/dotfiles ... testing.sh
Links to images:
00000001.jpg
00000005.jpg
00000010.jpg
00000026.jpg
00000027.jpg

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Posted: 2018-06-21T13:03:33-07:00
by dtr
snibgo wrote: 2018-06-21T09:21:36-07:00 I think the question is: "I have JPEG files. I want to put them inside PDF wrappers, guaranteeing the JPEG doesn't change. Is that possible with ImageMagick?"
I’m sorry, but that’s wrong.
dtr wrote: 2018-06-21T08:15:11-07:00 I wanted to find the best way to make a PDF from a bunch of JPEGs, so that the quality was preserved at maximum.
It would be good enough, if there was some stable mechanism of combining jpegs, that would guarantee visual identity. I don’t care if metadata will change, or there will be pixel manipulation, that’s beyond perception of the SSIM algo (if it returns ssim=1 for the extracted image).

What I’d like to know, is – what in the structure of a JPEG file or in the way it was compressed originally affects SSIM result in this case?

If you could provide a link to some article where it would be explained, why simply “integrating” JPEG into PDF isn’t possible, this would also be helpful, as that will lift off from me the burden of finding the way to integrate them as is.

<unrelated>However, if images are some kind of attachments and PDF supports the JPEG format as well, I wonder why it isn’t possible to transfer pixel information from a JPEG file as is?</unrelated>

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Posted: 2018-06-21T13:27:59-07:00
by snibgo
Sorry if I misunderstood you. But surely any change reduces quality (by definition)? The only way of "maintaining maximum quality" is by not changing at all.

As far as I know, simply “integrating” JPEG into PDF is possible, but IM doesn't do that. It reads an image, which might be JPEG or some other format. Then it might do some image processing. Then it writes to PDF, possibly with JPEG compression. It doesn't take a shortcut from input to output, but always decompresses then re-compresses, and this is often lossy: it changes pixels.

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Posted: 2018-06-21T14:39:22-07:00
by fmw42
I would suggest you search Google for jpg to pdf converters. See for example, https://en.softonic.com/downloads/jpg-to-pdf-converter

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Posted: 2018-06-21T18:49:36-07:00
by dtr
snibgo wrote: 2018-06-21T13:27:59-07:00 It doesn't take a shortcut from input to output, but always decompresses then re-compresses, and this is often lossy: it changes pixels.
Then it’s definitely strange, that the original image, that was compressed with quality 60, loses quality less, than an edited image, which was saved with quality 96. One would think, that if pixels are to be lost on the way, then the image, that has less, would suffer more. But it’s the other way round.

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Posted: 2018-06-21T19:02:17-07:00
by dtr
fmw42 wrote: 2018-06-21T14:39:22-07:00 I would suggest you search Google for jpg to pdf converters. See for example, https://en.softonic.com/downloads/jpg-to-pdf-converter
Sending people to hell openly is better, for it saves time.

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Posted: 2018-06-21T19:25:06-07:00
by fmw42
Are you just angry that you cannot get a good solution from ImageMagick? We are trying to help, but you do not have to be rude.

Snibgo explained it pretty clearly above:
As far as I know, simply “integrating” JPEG into PDF is possible, but IM doesn't do that. It reads an image, which might be JPEG or some other format. Then it might do some image processing. Then it writes to PDF, possibly with JPEG compression. It doesn't take a shortcut from input to output, but always decompresses then re-compresses, and this is often lossy: it changes pixels.

Re: “JPEG” algo treats JPEGs differently when combining to a PDF

Posted: 2018-06-21T19:36:40-07:00
by fmw42
Although I wrote the ssim shell script, I would suggest you replace that with the IM compare, which includes SSIM. But I think you might be better comparing using for example a method such as rmse that does not do more processing to the image. See http://www.imagemagick.org/script/comma ... php#metric