Win a copy of Java Mock Exams (software) this week in the Programmer Certification (OCPJP) forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

PDFBox: pdf's markup, how-to extract the pdf markup...

 
Jim Harrison
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I've read alot on http://pdfbox.apache.org but can't find an example or if the tool actually does this.

The pdf file that I'm reading has superscripts. I wanted to get the text and markup content of a pdf file. So a couple of questions:

1. can PDFBox do this? I see on their website the ExtractText (http://pdfbox.apache.org/commandlineutilities/ExtractText.html) but that just displays the text aspect of the pdf.

2. does any one have an example of doing this?

Thanks...Jim
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, PDFBox has no notion of extracting layout information.

You could check out at the source code of https://pdf-renderer.dev.java.net/, which can display PDFs, so it must have a way of accessing the layout data.
 
What are you doing? You are supposed to be reading this tiny ad!
the new thread boost feature brings a LOT of attention to your favorite threads
https://coderanch.com/t/674455/Thread-Boost-feature
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!