Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

How can I read a text from an image file ?  RSS feed

 
Gautam Ry
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I need to read a Text(Account Number) from a image file (.tif).
I tried the following approach :

try{
File newFile=new File("C:\\Image\\9R6-CCI\\09082010\\K08E091209FT_8021.tif");
BufferedImage buffImage=ImageIO.read(newFile);
ByteArrayOutputStream os= new ByteArrayOutputStream();
ImageIO.write(buffImage,IMAGE_TYPE,os);
byte []data=os.toByteArray();
String imageString=new BASE64Encoder().encode(data);
}catch (Exception e){}


But it was throwing problem. After googling, I found that ImageIO has some limitation to read an editable image.
Then, I tried the following approach :

try{File newFile=new File("C:\\Image\\9R6-CCI\\09082010\\K08E091209FT_8033.tif");
byte[] fileData = new byte[ (int)newFile.length()];
InputStream inStream = new FileInputStream( newFile);
inStream.read(fileData);
inStream.close();
String tempFileData = new String(fileData);
String imageString=new BASE64Encoder().encode(fileData);
}catch (Exception e){}


But i did n't get the desired out put . The Out put is as below.
xTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUx

Please, help me to address the issue.

Thanks and Regards
Gautam

 
Lester Burnham
Rancher
Posts: 1337
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Image files contain binary data - they can't be treated like character data. What's more, bitmap image formats (like TIFF, JPEG, GIF, PNG, etc.) do not contain any text they show in easily extractable form at all. Your best bet is to use an OCR package like Tesseract.
 
Gautam Ry
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Lester,

many many thanks for the useful responce.
I need some more details on your post to go ahead.

a) what is OCR package ? can i download it from internet ?
b) what is Tesseract. ?

Could you give me some examples on the issue?

Thanks again for the reply.

Regards
Gautam
 
Lester Burnham
Rancher
Posts: 1337
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OCR = Optical Character Recognition. It has a Wikipedia page that should get you started.

Googling for Tesseract should find its home page pretty quickly; it's not like that's a common word.
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!