• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

converting png to tiff and character recognition with tesseract

 
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
His,

Trying to have tesseract (http://code.google.com/p/tesseract-ocr/) read text from the tiff image (converted from a png image source either with imageio in Linux or Image Converter .EXE in Windows). The outputted text is empty or looks like \\\\\\\\\\\\\\\\\\\\\HHHHHHHHHHHH\\\\\\\\\\\\\\\\\UU\\\\\\\\\\\\\\\H\W

Does anyone have an idea what can cause the problem. I could imagine it is related with low contrast between yellow background and text or some type of attribute that one needs to set when converting from png.

original.png
[Thumbnail for original.png]
 
Denis Wen
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
next image
Filename: converted-with-java-imageio.tif
File size: 2 Kbytes
 
Denis Wen
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Denis Wen wrote:next image

Filename: converted-with-image-converter.tif
File size: 12 Kbytes
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Having a bit of experience with image processing (though not much with OCR), I would imagine that it's easier to perform OCR on a black-and-white image than on a colored image. If the images you need to work with are essentially bi-colored like the one shown above, converting it to B/W should not be hard to do, and may yield better results.
 
Denis Wen
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ok, I should try that. What's the best way to grayscale an image you would suggest? with ImageIO somehow?

Ulf Dittmer wrote:Having a bit of experience with image processing (though not much with OCR), I would imagine that it's easier to perform OCR on a black-and-white image than on a colored image. If the images you need to work with are essentially bi-colored like the one shown above, converting it to B/W should not be hard to do, and may yield better results.

 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This may give you some ideas: http://blog.codebeach.com/2008/03/convert-color-image-to-gray-scale-image.html
reply
    Bookmark Topic Watch Topic
  • New Topic