• Post Reply Bookmark Topic Watch Topic
  • New Topic

Etract data from pdf files

 
Phoebe Song
Ranch Hand
Posts: 58
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have an assignment to extract text from a pdf file. The pdf file has both Arabic and English text. My understanding is I can't use standard Java library to get this done. Can anyone recommend some free and easy to use 3rd party software? Has anyone worked on Arabic before?
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Check out JPedal, PDFBox and PDFTextStream (linked in http://faq.javaranch.com/java/AccessingFileFormats). Those work for English; not sure about Arabic.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!