Help coderanch get a
new server
by contributing to the fundraiser
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Devaka Cooray
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Tim Moores
  • Carey Brown
  • Mikalai Zaikin
Bartenders:
  • Lou Hamers
  • Piet Souris
  • Frits Walraven

PDF/CHM Metadata retrieval

 
Ranch Hand
Posts: 218
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hope this is the right forum for my query.

I've built an E-Book application using Eclipse 3.2 that archives e-books with their details (publisher etc.).

Now, I need to incorporate a new functionality into the application - I need to be able to select a folder containing ebooks (PDF/CHM Format)and I need the application to crawl through each of the books in the folder and populate the application database with the book details-name, author and other stuff.

Does anyone know how I could extract such metadata from a PDF or a CHM file through Java?
 
Bartender
Posts: 1844
Eclipse IDE Ruby Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There doesn't seem to be any open-source projects that will read PDF files (but a few that will write them.)

Here's someone's index of PDF tools for Java:

http://schmidt.devlib.org/java/libraries-pdf.html

Hope that this helps,
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The document properties (including title, author, creation date etc.) can be found by using iTexts PdfReader class. Its getInfo method returns a Map that contains a variety of metadata about the document.

If you want to extract text from a PDF, then either JPedel or PDFBox can do that. All the libraries are linked in the AccessingFileFormats page.
 
Ranch Hand
Posts: 686
Netbeans IDE Chrome Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Microsoft : http://poi.apache.org/

PDF: iText
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
POI does not handle CHM files.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic