Win a copy of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Etract data from pdf files  RSS feed

 
Phoebe Song
Ranch Hand
Posts: 58
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have an assignment to extract text from a pdf file. The pdf file has both Arabic and English text. My understanding is I can't use standard Java library to get this done. Can anyone recommend some free and easy to use 3rd party software? Has anyone worked on Arabic before?
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Check out JPedal, PDFBox and PDFTextStream (linked in http://faq.javaranch.com/java/AccessingFileFormats). Those work for English; not sure about Arabic.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!