Only 48 hours left in the trailboss' kickstarter!

New rewards and stretch goals. CLICK HERE!



  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Reading contents of PDF in JAVA  RSS feed

 
Senthil Kumar
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,
Can anyone suggest me an API for reading the contents of PDF file effectively?

Thanks in advance
Senthil Kumar.S
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That depends of what you mean by "effectively". The text that's part of a PDF may be extracted by libraries such as PDFBox, JPedal and PDFTextInputStream. You can find links to these in the http://faq.javaranch.com/java/AccessingFileFormats FAQ page.

If you are talking about the layout information, then that's not possible.
 
Peter Chase
Ranch Hand
Posts: 1970
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The FAQ says "PDF is a hard-to-read format". In fact, it's not outrageously difficult to read. And it is a properly documented standard. The difficulty is that people often imagine that they will be able to "convert" a PDF file to some other format that has a different purpose.

A PDF file contains a document for display and/or printing. It contains instructions (very much like compiled PostScript) to draw lines, shade areas, write text etc, at various places on the page. It does not contain much, if any, metadata about the relationships and purposes of these lines, areas and text. In this respect, it is very different to things like HTML, Word documents, RTF etc.

As an example, say you have a PDF file that, when displayed, shows a table of values. Nothing in the PDF file says it's a table. It's just a load of lines and text in various places. Therefore, it is nearly impossible for a general program to identify the PDF as a table and convert it into, say, an OpenOffice Document containing a table.

(To bartenders: I've written a number of similar replies recently. Any chance of improving the FAQ entry? I work for a company one of whose main businesses is PDF, so I'm sure I can get a nice concise entry for you.)
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've written a number of similar replies recently. Any chance of improving the FAQ entry?

Absolutely. If you don't feel like messing with the wiki yourself, send me whatever you come up with, and I'll put it in there.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!