Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

PDFBox: pdf's markup, how-to extract the pdf markup...

 
Jim Harrison
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I've read alot on http://pdfbox.apache.org but can't find an example or if the tool actually does this.

The pdf file that I'm reading has superscripts. I wanted to get the text and markup content of a pdf file. So a couple of questions:

1. can PDFBox do this? I see on their website the ExtractText (http://pdfbox.apache.org/commandlineutilities/ExtractText.html) but that just displays the text aspect of the pdf.

2. does any one have an example of doing this?

Thanks...Jim
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, PDFBox has no notion of extracting layout information.

You could check out at the source code of https://pdf-renderer.dev.java.net/, which can display PDFs, so it must have a way of accessing the layout data.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic