Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Problem while converting PDF to text convertion in Java

 
knazeer ahmed
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,
I am facing the below problem in converting PDF to text.
I have a scanned document which is in PDF. I want to extract the data from that PDF. I tried with PDFbox and Fontbox. but it will work only when the content of the PDF is real text (but not text in image).


Can any one help me in this?..


Thanks and Regards,
Nazeer
 
Rob Spoor
Sheriff
Pie
Posts: 20751
68
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You will need to use an OCR library for retrieving text from any kind of image.
 
Ulf Dittmer
Rancher
Posts: 42969
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
... something like Tesseract.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic