Page 1 of 1

Disable transparent background

Posted: 2019-07-19T07:51:00-07:00
by Flokker
Hi,

i have to extract some page from a pdf file for the company i'm working for to tiff. The command

convert -compress zip -density 300x300 +adjoin file.pdf[1] output1.tif

results in a tif file with a frame of transparent around the rest of the image. I think this is because of the OCR acrobat did. The source was a scanned image that was imported to acrobat and used ocr to make the pdf searchable. Acrobat also straightened the site.

How can i prevent that every time i convert a pdf to tif?

I#M on ubuntu and i ise image magick 6.9.7.

Re: Disable transparent background

Posted: 2019-07-19T08:01:38-07:00
by snibgo
You have "a frame of transparent around the rest of the image". What do you want instead? You might "-trim" to remove it. Or flatten against a white background (or any colour you want): "-background white -layers flatten".

Re: Disable transparent background

Posted: 2019-07-19T08:06:51-07:00
by Flokker
When i open the PDF there is no such frame. The Background is white like the test of the page (except the black text). What i want is to output the page as i see it in the PDF.

Re: Disable transparent background

Posted: 2019-07-19T08:41:49-07:00
by snibgo
You might try one of the pdf defines, eg use-trimbox. See http://www.imagemagick.org/script/comma ... php#define

Re: Disable transparent background

Posted: 2019-07-19T13:49:37-07:00
by Flokker
What works is pdf:use-cropbox=true but i don't understand why. There isn't anything cropped.

Re: Disable transparent background

Posted: 2019-07-19T14:40:32-07:00
by snibgo
Flokker wrote:There isn't anything cropped.
I suspect there is, and that if you read the PDF with a text editor you will see a "/CropBox" specification.

Re: Disable transparent background

Posted: 2019-07-20T00:05:26-07:00
by Flokker
I cannot open the PDF with a text editor. its a pdf not a text file.

Is there no other way to simply extract the pdf "as they are"?

Re: Disable transparent background

Posted: 2019-07-20T01:37:02-07:00
by snibgo
In Windows, PDF files can be opened with Microsoft Wordpad to view the file as raw text. I expect Unix has similar tools.

I don't know what "as they are" means. If the PDF has a cropbox, but also has content outside the cropbox, which version is the "real" one? The content might be registration marks that would be cut off a printed paper version. IM gives you the choice: use a cropbox (if the PDF has one) or don't.

And, of course, PDF files are vector. There is no definitive raster version.

If you need to convert PDF files, I suggest you read up Ghostscript documentation. You might decide to use Ghostscript directly.

Re: Disable transparent background

Posted: 2019-07-20T01:51:16-07:00
by Flokker
Works with -alpha remove

What i mean is that i want to extract every page as an single image so that the image looks like the page when i open it with a pdf viewer. like when i take a screenshot from the page.

Re: Disable transparent background

Posted: 2019-07-20T08:49:09-07:00
by fmw42
Is there no other way to simply extract the pdf "as they are"?
What are you asking? What do you mean by "as they are"?

Some PDF file are totally vector files. Some are raster files imbedded in a vector PDF shell. Imagemagick is a raster only processor. It uses Ghostscript to rasterize any PDF. Thus no vectors remain, only pixels.

You can extract every page of a PNG into individual images.

convert image.pdf +adjoin image.suffix

where suffix can be JPG or PNG, etc.

If you want raw editable text, then use some other tool.