Optimize dark (gray) image for OCR

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
Jordan
Posts: 4
Joined: 2019-05-25T03:50:56-07:00
Authentication code: 1152

Optimize dark (gray) image for OCR

Post by Jordan »

Hi,

I'm trying to optimize image300.png for Tesseract OCR, the image will always look the same except for the text. So far I've managed to generate output.tiff. I've tried to use Photoshop to guide me with a gui, without any luck (the values e.g. in Photoshop levels don't corrospond with -level in Imagemagick).

Code: Select all

convert img300.png -grayscale Rec709Luminance -channel RGB -black-threshold 13% -white-thresh
old 12.9% -negate output.tiff
Output.tiffImage
https://www.dropbox.com/s/mluzbmt6tuuus ... .tiff?dl=0

Original file img300.pngImage
https://www.dropbox.com/s/vdtt2w3kkfbix ... 0.png?dl=0

Optimal result img300_optimized.png
Image
https://www.dropbox.com/s/w5kkf2ty3mw7c ... d.png?dl=0

I'm hopeing someone has some tips for me to get a better result, love to hear from you!

Jordan
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Optimize dark (gray) image for OCR

Post by snibgo »

Instead of "-black-threshold 13% -white-threshold 12.9%", you could use "-threshold 12.95%".

In my experience of Tesseract, it works best when the height of capital letters such as ACSB is at least 20 pixels. Yours are only 8 pixels high.
snibgo's IM pages: im.snibgo.com
Jordan
Posts: 4
Joined: 2019-05-25T03:50:56-07:00
Authentication code: 1152

Re: Optimize dark (gray) image for OCR

Post by Jordan »

Yes that yields the same result!

Unfortunatly I don't have a higher resolution image (it will always be this format) I don't need capital letters though they only need to be detected correctly.

The OCR result I is:

Code: Select all

Sell Offers:

Amount: Lf. )+ Total: 659,887 @ | accept
Name Amount Piece Price Total Price |Ends At
Anonymous 1 659,887 659,887 |2019-06-22, 20:45:46 ‘
Anonymous 3 659,888 1,979,664 | 2019-06-22, 20:44:41
Anonymous 2 659,900 1,319,800 | 2019-06-22, 20:15:47
Anonymous 4 659,998| 2,639,992 |2019-06-22, 20:12:32
Anonymous 1 670,000 670,000 | 2019-06-22, 13:13:18
Anonymous 1 700,000 700,000 | 2019-06-22, 07:40:52
Anonymous 1 800,000 800,000 | 2019-06-22, 01:47:39 é
Buy Offers:

Amount: Lf. |+ Total: 570,102 @ | accept
Name Amount Piece Price Total Price |Ends At
Anonymous 1 570,102 570,102 | 2019-06-22, 20:45:50 x
Anonymous 3 570,101 1,710,303 | 2019-06-22, 20:11:29
Anonymous 5 570,100 2,850,500 | 2019-06-22, 20:06:11 .
Anonymous 1 570,000 570,000 | 2019-06-22, 20:00:31
Anonymous 4 569,600 2,278,400 | 2019-06-22, 19:57:38
Anonymous 1 569,512 569,512 | 2019-06-22, 19:20:04
Anonymous 1 569,502 569,502 | 2019-06-22, 19:03:28 é
Create Offer:
@: Sell Amount: 0 Gross Profit: 0e
_! Buy I. |- Fee: 0e

Piece Price: e| Total Profit: Je

_| Anonymous | Corace
Now it has some problems with the red text (Last two rows @ Sell Offers), is there a way to enhance the red to make it more bold or something?
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Optimize dark (gray) image for OCR

Post by fmw42 »

I do not know if this will help you or not. But if on a Linux, Mac OSX or Windows 10 unix (or Cygwin), you could try my textcleaner script at my link below.

input:
Image

Code: Select all

textcleaner -g -f 20 -o 10 -e normalize -i 1 img300.png img300_textclean_g_enorm_f20_o10_i1.png
Image

You can threshold further if you want.
Jordan
Posts: 4
Joined: 2019-05-25T03:50:56-07:00
Authentication code: 1152

Re: Optimize dark (gray) image for OCR

Post by Jordan »

Hi,

Thanks for your message! I have to say by the way that ImageMagick is just awesome <3

I haven't stop tweaking. It seemed that the original image had been alerted in some way. But I managed to isolate the right colors
Original image
https://www.dropbox.com/s/9lnhrd1rrr6ld ... 3.png?dl=0
Image

Code: Select all

convert screen2.png -fuzz 0% -fill "#30ff00" -opaque "#b01111" -opaque "#c0c0c0" -opaque "#f4f4f4" -opaque "#c87d7d" -opaque "#bebebe" -opaque "#808080" -fill none -fuzz 0% +opaque "#30ff00" -fuzz 0% -fill "#000000" -opaque "#30ff00" +profile "icc" -density 1200 output.png
Output image
https://www.dropbox.com/s/v637khxuk688t ... t.png?dl=0
Image

Is there some way to smooth the text a bit? I have a feeling it might help for the OCR software.

OCR results:

Code: Select all

Sell Offers:
Amount: 1 Total: 316,900 :
Name Amount Piece Price Total Price Ends At
Anonymous 2 316,900 633,600 2019-06-24, 15:13:23
Anonymous 1 316,999 316,999 2019-06-23, 00:37:06
Anonymous 1 317,000 317,000 2019-06-23, 00:14:13
Anonymous 1 319,000 319,000 2019-06-22, 23:18:24
Anonyrnious 5 334,899 1,674,495 2019-06-22, 00:56:37
Anonymous 1 339,900 339,900 2019-06-20, 01:20:40
Anonymous 1 342,315 342,315 2019-06-19, 19:07:31
Buy Offers:
Amount: o Total: o
Name Amount Piece Price Total Price Ends At
Anonyrnious 4 251,851 1,007,404 2019-06-24, 16:31:26
Anonymous 1 251,850 251,850 2019-06-24, 15:48:12
Anonyrnious 4 251,847 1,007,388 2019-06-24, 15:08:52
Anonymous 2 251,804 503,608 2019-06-24, 14:02:42
Anonymous 1 251,700 251,700 2019-06-24, 12:46:48
Anonymous 2 250,601 501,202 2019-06-23, 00:37:12
Anonyrnious 5 250,000 1,250,000 2019-06-22, 01:20:15
Create Offer:
Sell Amount: 5 Price: 1,259,255
@ Buy Fee: 1,000
Piece Price: 251851] Total Price: 1,260,255
“ énonymous .
Jordan
Posts: 4
Joined: 2019-05-25T03:50:56-07:00
Authentication code: 1152

Re: Optimize dark (gray) image for OCR

Post by Jordan »

Does someone have an idea how to blur the picture (to make it less pixelated?)

Also someone suggested the following:
I suggest blurring the picture before processing with tesseract (for example Gaussian Blur, horizontal 0.5, vertical 2.0). Does the recognition then improve?
I have no idea how to apply such a blur with imagemagick though
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Optimize dark (gray) image for OCR

Post by fmw42 »

Blurring the image will not do you any good. It will blur the text also and that will make it harder to OCR the characters. Try removing noise as follows:

Code: Select all

convert img300.png -enhance -enhance -enhance -enhance -enhance -enhance -enhance -enhance -enhance -enhance result.png
Post Reply