Converting PDF to Monochrome

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
howard39
Posts: 12
Joined: 2013-01-24T17:25:58-07:00
Authentication code: 6789

Converting PDF to Monochrome

Post by howard39 »

I'm working with a lot of PDF files that were obtained by scanning 1 to 10 page documents consisting mostly of text, and saving as PDF. The ones that were saved in black and white mode are nice and compact, but the ones that were saved in grayscale or color mode are too long. So I'd like to use a tool to convert the latter to monochrome.

The following does what I want:

>convert inputpath -monochrome outputpath

*except* that it degrades the resolution too much. Looks like it's working with about 72 dpi and I'd like 600, or at least 300.

Using the -identify switch, an input file gives, "PDF 612x792 612x792+0+0 16-bit ColorSeparation CMYK 1.939MB" and the output file from the above convert operation gives, "PDF 734x950 734x950+0+0 16-bit sRGB 37.7KB". The actual file lengths are 1.1 MB and 77 KB.

What do I need to do to convert a multipage pdf file from a scanned document to a monochrome version without loss or resolution.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Converting PDF to Monochrome

Post by snibgo »

1. Review the "density" option. The default 72 is generally too low.

Code: Select all

convert -density 288 in.pdf -resize 25% out.png
Or higher, if you want.

2. I generally don't like text that has been though "-monochrome", because it removes the anti-aliasing of the edges. Better quality possibilities include "-level 25x75%", etc.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Converting PDF to Monochrome

Post by fmw42 »

-monochrome dithers by default. see http://www.imagemagick.org/Usage/quantize/#monochrome. You would be better of using -threshold or something else and setting the -type to bilevel.
howard39
Posts: 12
Joined: 2013-01-24T17:25:58-07:00
Authentication code: 6789

Re: Converting PDF to Monochrome

Post by howard39 »

Unfortunately I haven't had much luck.

convert -density 288 in.pdf -resize 25% out.pdf makes the file twice as large with about 1/3 the resolution, with dithering and grayscale.

convert in.pdf -threshold 50% -type bilevel outfile.pdf makes the file 15 times smaller with 1/8 the resolution, bilevel.

The images in my input pdf file, which was produced by a scanner, appear to be conpressed, but those in the ImageMagick outputs may not be compressed.

I would have thought that convert in.pdf out.pdf would produce an output file that is the same as the imput file, but actually the output fileis twice as large and samples at approx 1/8 the resolution.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Converting PDF to Monochrome

Post by snibgo »

If you provide a sample of your input file, we won't need to guess.

A PDF file is generally vector data, which includes text, though it can contain bitmaps. IM is bitmap software. If it sees vector data, it will convert it to bitmaps. If your output file is PDF, it will contain one bitmap for each page. "-density" is the general way of getting higher resolution from vector data, but at the expense of filesize. Vector data can contain almost any level of detail, so you might need a massive value of "-density" to see it all.

It would be nice if IM recognised the special case of PDF files with one bitmap image per page, all at the same resolution, and could report the appropriate detail. Perhaps your file is like this. I can't tell. There are workarounds to get this information, so you can convert using the exact density it was scanned at.
snibgo's IM pages: im.snibgo.com
howard39
Posts: 12
Joined: 2013-01-24T17:25:58-07:00
Authentication code: 6789

Re: Converting PDF to Monochrome

Post by howard39 »

Maybe the trick is to get ImageMagick to compress the bitmaps that it embeds in the output PDF file. Can it do this?
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Converting PDF to Monochrome

Post by snibgo »

Yes. See http://www.imagemagick.org/script/comma ... p#compress

Also see -quality for the jpeg compression.
snibgo's IM pages: im.snibgo.com
howard39
Posts: 12
Joined: 2013-01-24T17:25:58-07:00
Authentication code: 6789

Re: Converting PDF to Monochrome

Post by howard39 »

OK, this seems to work pretty well

Code: Select all

convert -density 600 in.pdf -threshold 15% -type bilevel -compress fax out.pdf
It compress the input pdf file by a factor of about 8 and changes it to black and white. I just need to test on a larger collection of files to make sure it can handle a variety of different scanned documents.

It's a bit slow -- takes 10-15 sec for a single page doc on a fast pc.

Thanks for the help.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Converting PDF to Monochrome

Post by fmw42 »

The lack of speed is due to setting the density to 600. But that is important to get quality.
Post Reply