Can I use IM to optimize scanned PDF?

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
Nokia808
Posts: 5
Joined: 2017-03-17T07:01:04-07:00
Authentication code: 1151

Can I use IM to optimize scanned PDF?

Post by Nokia808 »

Hi. I,m new Linux user (less than 1 year). I recently discover the power of IM that I previously not aware about it.

Currently - after spend time in investigation - I use this script, that I build, to optimize group of already black & white (monochrome) images (deskew, descreen, with or without sharpen text) before merge them into single PDF:

#! /bin/bash
convert *."$1" -deskew 80% -morphology Close Diamond:1 -sharpen 0x1.0 -alpha off -monochrome -compress Group4 "$2".pdf

I apply it on both .tiff & .pcx with super-excellent result & give me small sized PDF output. At a time I need "-sharpen" to improve result & at other time it cause problem by reverse effect of descreen of "-morphology" so I delete it.

Problem is that: I need to apply "-deskew", "-morphology", with or without "-sharpen" on scanned PDF files that I downloading them from Internet & not being scanned by me, so as to optimize them, but the output PDF file being severely blurred ! I tried to overcome this by use "-density 600" but not working at all !

$ convert input.pdf -density 600 -deskew 80% -morphology Close Diamond:1 -alpha off -monochrome -compress Group4 "$2".pdf

I tried to increase value of "-density" but no benefit !! I increased it's value for very high level (600000). No any effect ! Only delay in time till output PDF obtained & still severly blurred !

Please, any one can inform me: is IM suitable to optimize scanned PDF by this way or not ? If yes, so what I have to modify in above command ?

See the following file example:
https://drive.google.com/file/d/0B1nB8I ... A4Y3M/view

I will delete this file after 24 hr.

Please concentrate on pages 2 & more & ignore 1st page because it is colored & blurring effects not appear on it as clearly as on remainder monochrome pages.
Last edited by Nokia808 on 2017-03-17T09:02:53-07:00, edited 5 times in total.
User avatar
GeeMack
Posts: 718
Joined: 2015-12-01T22:09:46-07:00
Authentication code: 1151
Location: Central Illinois, USA

Re: Can I use IM to optimize scanned PDF?

Post by GeeMack »

Nokia808 wrote: 2017-03-17T07:46:02-07:00Please, any one can inform me: is IM suitable to optimize scanned PDF by this way or not ? If yes, so what I have to modify in above command ?
The first thing to try is to put the "-density 600" before the input file name instead of after it.
Nokia808
Posts: 5
Joined: 2017-03-17T07:01:04-07:00
Authentication code: 1151

Re: Can I use IM to optimize scanned PDF?

Post by Nokia808 »

I tried just:

convert -density 600 test.pdf new.pdf

& failed & no output file ! I got following error in terminal:

convert: unable to extent pixel cache `No such file or directory' @ fatal/cache.c/CacheSignalHandler/3394.

My laptop freeze & wait for time before return usable !

I tried it again with -density 300 as following:

convert -density 300 test.pdf new.pdf

but same result & same error message !

---------------------
What is difference if I put -density before input file from if I put it after input file ?
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Can I use IM to optimize scanned PDF?

Post by snibgo »

If you want to rasterize at 600 dpi, "-density 600" needs to come before the input filename, not after.

The PDF has 351 pages. Each page is 6.26 by 9.44 inches. At 600 dpi, assuming IM v6 Q16, each page needs 600*600*6.26*9.44*8 bytes = 170 MB of memory. So 351 pages needs 60 GB.

If you don't have 60 GB available memory, that's the problem.

The job can be done with IM, in small batches of pages, eg:

Code: Select all

convert -density 600 scanned.pdf[0-9] out.tiff
However, the pdf appear to contain one raster image per page, with no vector data. IM will call Ghostscript to convert each page to many pixels, and this will re-sample the data unless you happen to use exactly the correct density (which seems to be 300 dpi).

The most suitable tool to extract the images is pdfimages, not ImageMagick. It is very quick, and doesn't need much memory. Then you can use IM to de-skew etc.
snibgo's IM pages: im.snibgo.com
Nokia808
Posts: 5
Joined: 2017-03-17T07:01:04-07:00
Authentication code: 1151

Re: Can I use IM to optimize scanned PDF?

Post by Nokia808 »

Hi. It is very disappointing issue of memory consuming by IM !! It make program largely useless for processing big numbers of images like 500 image !

I noticed that problem lie in special with command "convert" not with mogrify ! From that I will led to this question:

is there a command line tool (like pdfimages) allow me to combine huge number of images to single PDF without memory issue ? You supply me by "pdfimages" which is very strong tool & it will be very useful if there is similar tool for combine images into single pdf.

Best
User avatar
magick
Site Admin
Posts: 11064
Joined: 2003-05-31T11:32:55-07:00

Re: Can I use IM to optimize scanned PDF?

Post by magick »

See https://www.imagemagick.org/script/arch ... tera-pixel for a discussion on using ImageMagick with large images. ImageMagick can process large images or a large image sequence in a small memory footprint. It stages the pixel cache on disk-- much slower than memory but it does permit ImageMagick to handle tera-pixels.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Can I use IM to optimize scanned PDF?

Post by snibgo »

There are many tools available for manipulating PDF files. I don't use PDF, so I'm not familiar with them.
snibgo's IM pages: im.snibgo.com
Nokia808
Posts: 5
Joined: 2017-03-17T07:01:04-07:00
Authentication code: 1151

Re: Can I use IM to optimize scanned PDF?

Post by Nokia808 »

magick wrote: 2017-03-18T05:39:22-07:00 See https://www.imagemagick.org/script/arch ... tera-pixel for a discussion on using ImageMagick with large images. ImageMagick can process large images or a large image sequence in a small memory footprint. It stages the pixel cache on disk-- much slower than memory but it does permit ImageMagick to handle tera-pixels.
Hi. Thank you very much for this informative link ! Indeed Linux is very powerful OS ! At each time loss hope in it, I get a saver !

But approach explained in link you gave me practically useful only if you have SSD, right ?

I have alternative idea, kindly to review it to see if it is applicable, correct or wrong. And correct to me if there is(are) error(s). My alternative is by the following:

Replace my "convert" commands by their corresponding "mogrify" commands, like:

$ mogrify -deskew 80% -compress group4 *.tiff
$ mogrify -morphology Close Diamond:1 -compress group4 *.tiff

or collectively put them in single command like:
$ mogrify -deskew 80% -morphology Close Diamond:1 -compress group4 *.tiff

Then use following "mogrify" command, to create individual PDF files, one for each of original images files:
$ mogrify -format pdf *tiff

Finally merge resulted individual PDF files into single PDF by pdfunite:
$ pdfunite *.pdf output.pdf
or
$ qpdf --empty --pages *.pdf -- output.pdf

The problem chiefly with size of output.pdf ! It will be = to total amount of individual PDF files which is too biger than if I use:
$ convert *.tiff -deskew 80% -morphology Close Diamond:1 -alpha off -monochrome -compress Group4 output.pdf

I applied on PCX files their total size = 42 mb . By "convert" I got output.pdf of 6.8 mb, while by "mogrify" I got 36.4 Look how much difference in size !!

I feel that error in using:
$ mogrify -format pdf *.pcx
because I did not include "-alpha off -monochrome -compress Group4"

So, my questions are:

1) can I include "-alpha off -monochrome -compress Group4" within "$ mogrify -format pdf *.pcx" ? & if yes, then how can I do this ?

2) if answer of above point is yes, then can I include filters also ? I mean include all "-deskew 80% -morphology Close Diamond:1 -alpha off -monochrome -compress Group4" within "$ mogrify -format pdf *.pcx" ? & if yes, then how exactly ?

3) does following commands are valid ?
$ mogrify -deskew 80% -compress group4 *.tiff
$ mogrify -morphology Close Diamond:1 -compress group4 *.tiff
$ mogrify -deskew 80% -morphology Close Diamond:1 -compress group4 *tiff

I need your help please because I noticed that "mogrify" some time does not give error message if entered in wrong way or with wrong argument.

Best.
Nokia808
Posts: 5
Joined: 2017-03-17T07:01:04-07:00
Authentication code: 1151

Re: Can I use IM to optimize scanned PDF?

Post by Nokia808 »

Hi.

1) I did a mistake: *.tiff or *.pcx should set at END of mogrify commands. I edit my previous post regard this.
I applied commands & all are working O.K

2) I did combined all them into single command as following:
$ mogrify -deskew 80% -morphology Close Diamond:1 -compress group4 -format pdf *.tiff
It work & achieve what convert command failed to achieved due to memory issue ! It take long time, but my laptop not freeze during process & I was able to do other processes like browsing Internet & exploring directories !!

But only one think I'm not sure about: does I have to put "-format pdf" at end of mogrify command or at beginning ? I mean which is more correct:

$ mogrify -deskew 80% -morphology Close Diamond:1 -compress group4 -format pdf *.tiff
or
$ mogrify -format pdf -deskew 80% -morphology Close Diamond:1 -compress group4 *.tiff

I tried both of them & both worked without diffirent in size nor in quality of output files.

Please your help regard this before proceed into use of "-desnity" option with mogrify so as to achieve goal of this topic.

Best.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Can I use IM to optimize scanned PDF?

Post by snibgo »

"-format" affects only the format of the output file. It has no impact on anything else, so it doesn't matter where you put it. I would put it at the end, on either side of "-compress".
snibgo's IM pages: im.snibgo.com
Post Reply