• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Paul Clapham
  • Jeanne Boyarsky
Sheriffs:
  • Devaka Cooray
  • Junilu Lacar
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Ron McLeod
  • Tim Holloway
  • Claude Moore
  • Stephan van Hulst
Bartenders:
  • Winston Gutkowski
  • Carey Brown
  • Frits Walraven

Tesseract OCR Library Issue  RSS feed

 
Bartender
Posts: 1649
17
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I've installed via brew the Tesseract 4 library on the mac.

I also am using the Tess4j library, which seems to have some issues with Tesseract.

In particular, when I try to OCR a file, I get an error that many, many others have reported. I've yet to find a working solution.

!strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 209
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x000000012db6626e, pid=1762, tid=0x0000000000005e03


---

RESOLVED:


The issue is that you need to set the environment variable for locale. On the Mac, if you type "locale" at the command window, you may have the situation where "LC_ALL" has no right hand side value. The authors of the API apparently didn't check for that missing right-hand-side value situation before doing a "strcmp" and crashing the JVM.  On Windows Server, I had no issue with the locale as on the Mac. I installed Tesseract from this site: Tessearct Download Windows and it worked right away.

If you want to install other languages, other than default English, you can find the language training files here: Tesseract Training Files.

The other thing to really watch is that you have to make sure you tell Tesseract where the training data directory is for languages. On the Mac, that's (for me): /usr/local/share/tessdata.  On Windows: C:\Program Files (x86)\Tesseract-OCR\tessdata

Hope this update helps someone.

-- mike
 
Marshal
Posts: 24195
54
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I hope so too. Thanks for posting the resolution!
 
What are you doing? You are supposed to be reading this tiny ad!
Become a Java guru with IntelliJ IDEA
https://www.jetbrains.com/idea/
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!