• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Paul Clapham
  • Jeanne Boyarsky
  • Devaka Cooray
  • Junilu Lacar
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Ron McLeod
  • Tim Holloway
  • Claude Moore
  • Stephan van Hulst
  • Winston Gutkowski
  • Carey Brown
  • Frits Walraven

Tesseract OCR Library Issue  RSS feed

Posts: 1649
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I've installed via brew the Tesseract 4 library on the mac.

I also am using the Tess4j library, which seems to have some issues with Tesseract.

In particular, when I try to OCR a file, I get an error that many, many others have reported. I've yet to find a working solution.

!strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 209
# A fatal error has been detected by the Java Runtime Environment:
#  SIGILL (0x4) at pc=0x000000012db6626e, pid=1762, tid=0x0000000000005e03



The issue is that you need to set the environment variable for locale. On the Mac, if you type "locale" at the command window, you may have the situation where "LC_ALL" has no right hand side value. The authors of the API apparently didn't check for that missing right-hand-side value situation before doing a "strcmp" and crashing the JVM.  On Windows Server, I had no issue with the locale as on the Mac. I installed Tesseract from this site: Tessearct Download Windows and it worked right away.

If you want to install other languages, other than default English, you can find the language training files here: Tesseract Training Files.

The other thing to really watch is that you have to make sure you tell Tesseract where the training data directory is for languages. On the Mac, that's (for me): /usr/local/share/tessdata.  On Windows: C:\Program Files (x86)\Tesseract-OCR\tessdata

Hope this update helps someone.

-- mike
Posts: 24195
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I hope so too. Thanks for posting the resolution!
What are you doing? You are supposed to be reading this tiny ad!
Become a Java guru with IntelliJ IDEA
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!