• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

optical character recognition (OCR) with java

 
Ashish S Yadav
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is java suitable for making OCR software, ie one which converts text in a photo to text form? If yes, then how do i do it ? Does it involve advanced java programming and years of experience to make such code ?
 
William P O'Sullivan
Ranch Hand
Posts: 859
Chrome IBM DB2 Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
1. Yes

2. Process every pixel in the image, and figure out which character it is.

3. Yes

You're better off using an off-the-shelf or open source software package if you need this in a hurry.

WP
 
Paul Clapham
Sheriff
Posts: 21322
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I wouldn't think that the programming would be "advanced". But I would think that the algorithms used to recognize and distinguish letters would be "advanced". Programming the algorithms should be quite straightforward once they have been identified and specified.
 
Ashish S Yadav
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am not in a hurry. I want to develop a simple one myself. Any suggestions for what kind of things i should read or try before i can make such code ?
 
Campbell Ritchie
Sheriff
Pie
Posts: 49864
71
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ashish S Yadav wrote:I am not in a hurry. I want to develop a simple one myself. Any suggestions . . . ?
Learn how to live for 1000 years. This is a big problem which has taken thousands of man-years to solve. I don’t know whether the algorithms are published.

You would have to work out the relationship between black and white pixels. If for example you have a box containing your character, and the top left pixel is white, you know it can’t be BDEFHKLMNPRTUVWXYZ, because those all have a black pixel at their top left. If that pixel is black, and you go straight down from that top left black pixel, that means your letter could be T V W X Y or Z because those capital letters have black at the top left and white on the middle left.
 
Nikolay Shi
Greenhorn
Posts: 2
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you need to use OCR in your project, creating your own engine is not the best idead if you ask me :/ However, there's not not much existing developer tools for OCR in Java.

As far as i know there are no native opensource Java OCR SDKs. There are Java APIs which wrap calls for native interfaces, for example, for one of the most popular opensource OCR engines - Tesseract (http://groups.google.com/group/tesseract-ocr/) - there are some Java wrappers like tesjeract (http://code.google.com/p/tesjeract/) or Tess4J (http://tess4j.sf.net/). That could work for you, but it's rather hard to set up and will require developing image-preprocessing and font training on your side.

One more solution could be a cloud service. It requires end-user application to have the internet connection, but it's independent from your programming language choice and resources limitations (which is importatnt on mobile devices, OCR proccess consumes rather big amount of recources). Have a look at http://www.ocrsdk.com, it's a cloud-based OCR SDK that let you upload an image through web API and returns you the OCRed data.

This Web API based OCR SDK is not free, which may not be suitable for you, but i still recommend you try it out (it has a free 90 days trial without any upfront charges) as its pricing is really affordable in comparison with enterprise solutions while it provides enterprise-level OCR accuracy which is way better than open source. You may also find usefl its codesample repo @github: https://github.com/abbyysdk/ocrsdk.com

Hope it helps!
 
Campbell Ritchie
Sheriff
Pie
Posts: 49864
71
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch

I have used abbyy myself, a long time ago, and it was good. I didn’t try any programming with it, however.
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ashish S Yadav wrote:Any suggestions for what kind of things i should read or try before i can make such code ?


At the risk of stating the obvious, I would suggest doing some research on OCR and its algorithms in general. As already pointed out, you need to understand how OCR works before you can figure out how to put it into Java. Regardless of what language you use, it starts with something like: a matrix of pixels (black/white, greyscale, or color); some math to work out which characters those pixels might represent; the confidence level or probability of it being a given one of those candidate characters. You'll need a decent grasp of those concepts before you can even think about how to write the code.
 
Ethan Paul
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hello everyone

I want to implement ocr in java for my project.but unfortunately, i dont know how to begin. It would be great if someone could help me out in this. Please tell me from where i can study for this so that i can code in java.
thanks

regards
ethan
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic