Page 1 of 1

Need to Extract Hindi Text from PDF(Image) File

Posted: 2018-10-13T07:17:23-07:00
by codebox
I am Having a PDF file from which I need to extract Text in the Hindi Font Only. The PDF seems to be Image. Please Guide how to extract this in Text/Excel file.

Sample File
https://www.dropbox.com/s/kxbgp3cxb606i ... e.pdf?dl=0

Thanks

Re: Need to Extract Hindi Text from PDF(Image) File

Posted: 2018-10-13T09:00:27-07:00
by Bonzo
Have you tried dedicated OCR software?

I tried part of your first page on http://www.i2ocr.com/free-online-hindi-ocr and it was a bit slow and was not 100% correct but I would think you could edit the output. I doubt any OCR software would be 100%.

Personally unless you have hundreds to do I would type it out manually as by the time you have checked the results are correct you could have done it.

Re: Need to Extract Hindi Text from PDF(Image) File

Posted: 2018-10-13T09:53:26-07:00
by codebox
Yes I tried that, Before posting to this forum and after your reply again.
I got error as "Invalid Input Image Type"

I chose Input Language as "Hindi"

Thanks

Re: Need to Extract Hindi Text from PDF(Image) File

Posted: 2018-10-13T11:33:02-07:00
by Bonzo
I did not download your whole file but took a screen capture and it was saved as a png - Microsoft snipping tool

Re: Need to Extract Hindi Text from PDF(Image) File

Posted: 2018-10-13T22:56:07-07:00
by codebox
OK. Will try it using a PNG file. Thanks