Removing water mark from scanned image

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
darktangent
Posts: 2
Joined: 2019-06-26T02:00:41-07:00
Authentication code: 1152

Removing water mark from scanned image

Post by darktangent » 2019-06-26T02:15:23-07:00

we are going through digitization of records. The problem we are facing is that documents have water mark on them. When the documents are scanned and put through OCR the watermark disturbs the OCR process and the text can not be extracted. An example of the documents we are processing is like https://imgur.com/QqprgcR

One can see the diagonal watermark of Approved. We need to remove this water mark keeping the text above it e.g Deputy as shown in Image. I am a newbie to imagemagick, i have tried different tutorials related to closed component labeling and morphology but could not get the watermark removed.

Can some body help to guide what would be efficient manner to remove the watermark from the document by using imagemagick?

snibgo
Posts: 12161
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Removing water mark from scanned image

Post by snibgo » 2019-06-26T06:10:29-07:00

From the image you supply, I doubt there is a good solution. I can't see what distinguishes the large text from the small text. You could remove the long lines of the large text but cleanly removing these, while leaving "ep" of "Deputy", seems impossible.
snibgo's IM pages: im.snibgo.com

darktangent
Posts: 2
Joined: 2019-06-26T02:00:41-07:00
Authentication code: 1152

Re: Removing water mark from scanned image

Post by darktangent » 2019-06-26T12:31:14-07:00

snibgo wrote:
2019-06-26T06:10:29-07:00
From the image you supply, I doubt there is a good solution. I can't see what distinguishes the large text from the small text. You could remove the long lines of the large text but cleanly removing these, while leaving "ep" of "Deputy", seems impossible.
What would be the way to remove long lines of large text?

snibgo
Posts: 12161
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Removing water mark from scanned image

Post by snibgo » 2019-06-26T12:40:08-07:00

With "-connected-components".

Threshold the image, possibly after a blur, then use "-connected-components" with an area threshold to get just the large black components. Negate that, and "-compose Lighten -composite" with the original. The result will have the largest black marks whitened. But this will also whiten "ep" of "Deputy".
snibgo's IM pages: im.snibgo.com

Post Reply