• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Paul Clapham
  • Devaka Cooray
  • Knute Snortum
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Frits Walraven
Bartenders:
  • Carey Brown
  • salvin francis
  • Claude Moore

Tesseract OCR Library Issue  RSS feed

 
Bartender
Posts: 1661
17
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I've installed via brew the Tesseract 4 library on the mac.

I also am using the Tess4j library, which seems to have some issues with Tesseract.

In particular, when I try to OCR a file, I get an error that many, many others have reported. I've yet to find a working solution.

!strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 209
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x000000012db6626e, pid=1762, tid=0x0000000000005e03


---

RESOLVED:


The issue is that you need to set the environment variable for locale. On the Mac, if you type "locale" at the command window, you may have the situation where "LC_ALL" has no right hand side value. The authors of the API apparently didn't check for that missing right-hand-side value situation before doing a "strcmp" and crashing the JVM.  On Windows Server, I had no issue with the locale as on the Mac. I installed Tesseract from this site: Tessearct Download Windows and it worked right away.

If you want to install other languages, other than default English, you can find the language training files here: Tesseract Training Files.

The other thing to really watch is that you have to make sure you tell Tesseract where the training data directory is for languages. On the Mac, that's (for me): /usr/local/share/tessdata.  On Windows: C:\Program Files (x86)\Tesseract-OCR\tessdata

Hope this update helps someone.

-- mike
 
Sheriff
Posts: 24366
55
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I hope so too. Thanks for posting the resolution!
 
I yam what I yam and that's all that I yam - the great philosopher Popeye. Tiny ad:
Create Edit Print & Convert PDF Using Free API with Java
https://coderanch.com/wiki/703735/Create-Convert-PDF-Free-Spire
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!