Win a copy of Node.js Design Patterns: Design and implement production-grade Node.js applications using proven patterns and techniques this week in the Server-Side JavaScript and NodeJS forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

How to run PDFBox

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How to index file pdf. I use PDFBox-0.7.3.jar and wirte folowing code:

import java.io.File;
import org.pdfbox.searchengine.lucene.IndexFiles;

public class PDFBoxIndexFiles {
public static void main(String[] args) throws Exception {

IndexFiles indexFiles = new IndexFiles();
indexFiles.index(new File("d:\\testpdf"), true, "D:\\pdfindex");

}
}
when run program, is appear error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/lucene/document/DateTools$Resolution
at org.pdfbox.searchengine.lucene.LucenePDFDocument.<init>(LucenePDFDocument.java:131)
at org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument(LucenePDFDocument.java:376)
at org.pdfbox.searchengine.lucene.IndexFiles.addDocument(IndexFiles.java:295)
at org.pdfbox.searchengine.lucene.IndexFiles.indexDocs(IndexFiles.java:269)
at org.pdfbox.searchengine.lucene.IndexFiles.indexDocs(IndexFiles.java:236)
at org.pdfbox.searchengine.lucene.IndexFiles.indexDocs(IndexFiles.java:223)
at org.pdfbox.searchengine.lucene.IndexFiles.index(IndexFiles.java:165)
at PDFBoxIndexFiles.main(PDFBoxIndexFiles.java:10)
Indexing PDF document: d:\testpdf\EZCMAdminGuideV301.pdf
[ June 28, 2007: Message edited by: Ben Souther ]
 
Rancher
Posts: 43026
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to JavaRanch.

Is the Lucene library in your classpath when you run this program? If yes, does PDFBox maybe require a different version of Lucene than the one you have in your classpath?
 
Ranch Hand
Posts: 89
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You must be using latest version of lucene, but the pdfbox version is 2006 release.
So pdfbox's IndexFiles class is not supporting the changes in Lucene API,e.g it
is not supporting the new version of using Field constructors.

So the built in support for pdf indexing using IndexFiles and LucenePDFDocument is not supported with this latest version of lucene.

For parsing pdf documents you have to use org.pdfbox.pdfparser.PDFParser class.
[ July 16, 2007: Message edited by: Neetika Sood ]
 
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You are trying to use the generic indexfiles.java as a base template to index pdf files.If you check the lib folder of PDFbox you will see that it has the version of lucene it is compatible with, add this to your classpath as well.As far as indexing pdf is concerned all you need to do is add a conditional statement as follows to the indexHtml.java file
if(file.getPath().endsWith(".pdf"))
{ System.out.println(file.getPath());
Document document1 = LucenePDFDocument.getDocument(file); System.out.println((new StringBuilder()).append("adding ").append(document1.get("path")).toString());
writer.addDocument(document1); } }

Regards
Rod Manning
 
shinichi kudo
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
thanks every body!!!
reply
    Bookmark Topic Watch Topic
  • New Topic