• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

How can I read a text from an image file ?

 
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I need to read a Text(Account Number) from a image file (.tif).
I tried the following approach :

try{
File newFile=new File("C:\\Image\\9R6-CCI\\09082010\\K08E091209FT_8021.tif");
BufferedImage buffImage=ImageIO.read(newFile);
ByteArrayOutputStream os= new ByteArrayOutputStream();
ImageIO.write(buffImage,IMAGE_TYPE,os);
byte []data=os.toByteArray();
String imageString=new BASE64Encoder().encode(data);
}catch (Exception e){}


But it was throwing problem. After googling, I found that ImageIO has some limitation to read an editable image.
Then, I tried the following approach :

try{File newFile=new File("C:\\Image\\9R6-CCI\\09082010\\K08E091209FT_8033.tif");
byte[] fileData = new byte[ (int)newFile.length()];
InputStream inStream = new FileInputStream( newFile);
inStream.read(fileData);
inStream.close();
String tempFileData = new String(fileData);
String imageString=new BASE64Encoder().encode(fileData);
}catch (Exception e){}


But i did n't get the desired out put . The Out put is as below.
xTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUxTFMUx

Please, help me to address the issue.

Thanks and Regards
Gautam

 
Rancher
Posts: 1337
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Image files contain binary data - they can't be treated like character data. What's more, bitmap image formats (like TIFF, JPEG, GIF, PNG, etc.) do not contain any text they show in easily extractable form at all. Your best bet is to use an OCR package like Tesseract.
 
Gautam Ry
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Lester,

many many thanks for the useful responce.
I need some more details on your post to go ahead.

a) what is OCR package ? can i download it from internet ?
b) what is Tesseract. ?

Could you give me some examples on the issue?

Thanks again for the reply.

Regards
Gautam
 
Lester Burnham
Rancher
Posts: 1337
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
OCR = Optical Character Recognition. It has a Wikipedia page that should get you started.

Googling for Tesseract should find its home page pretty quickly; it's not like that's a common word.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic