• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Parsing PDF

 
Ranch Hand
Posts: 178
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Anyone knows of any APIs that allow you to parse PDF files using Java?
 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I know a tool which could be suitable.
"PJ" from Etymon. Have a look at the following URL:
http://www.etymon.com/pj/index.html
 
Aleksey Matiychenko
Ranch Hand
Posts: 178
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I found the tool but it has no documentation and I am having a hard time figuring out how to parse a document. Any ideas?
 
Ranch Hand
Posts: 439
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
isn't there on a left hand side a documentaion link , i just browsed though it.
 
Aleksey Matiychenko
Ranch Hand
Posts: 178
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
yeah, but it does not have much.
 
H.-Gerd Rosarius
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey guys!
Well, you're right about the documentation. In general it is
quite bad and it's not much. Furthermore "PJ" itself is not what I call "comfortable", but it seems we don't have a big choice. I couldn't find real alternatives.
A friend told me, that there is a project from
the apache group that could provide the classes I need, but I couldn't find something. May be he fooled me? )
If YOU find something, it would be kind to post it.
H.-Gerd
 
H.-Gerd Rosarius
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey friends!
Here some code I wrote. It is not working absolutely perfect, but for my purposes it is ok. I was interested in the String-object within a pdf-dokument.
Maybe you can use it for your purposes:
//####################################################################
public void analyze() {

Pdf myPdfDoc = null;
try {
myPdfDoc = new Pdf( this.pathToPdf );
}
catch ( FileNotFoundException fnf ) {
fnf.printStackTrace();
}
catch ( IOException ioe ) {
ioe.printStackTrace();
}
catch ( PjException pje ) {
pje.printStackTrace();
}

try {
if (myPdfDoc.getEncryptDictionary() != null) {
System.out.println("File appears to be encrypted.");
}
else {
int objectNum = myPdfDoc.getMaxObjectNumber();
for ( int i = 1; i <= objectNum; i++ ) {
PjObject myPdfObject = myPdfDoc.getObject( i );
if ( myPdfObject != null ) {
if ( myPdfObject instanceof PjStream ) {
StreamParser sp = new StreamParser();
PjStream myPjStream = ( ( PjStream ) myPdfObject ).flateDecompress();

Vector myvec = sp.parse( myPjStream );
for ( int j = 0 ; j < myvec.size() ; j++ ) {
if ( myvec.get( j ) instanceof XTj ) {
PjString myPjString = ( ( XTj ) myvec.get( j ) ).getText();
System.out.println ( myPjString.getString() );
}
}
}
}
}
}
}
catch ( PdfFormatException pfe ) {
pfe.printStackTrace();
}
catch ( PjException pje ) {
pje.printStackTrace();
}
}
 
Aleksey Matiychenko
Ranch Hand
Posts: 178
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you.
This works like a charm
 
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Nassar,
I am using pjx.jar api to read pdf document. My aim is to get the "object structure" of the pdf document.
When i try to read pdf files, i get exceptions like ...
while reading pdf file1
com.etymon.pj.exception.PdfFormatException: Token " " not recognized.
at com.etymon.pj.StreamParser.processToken(StreamParser.java:958)
at com.etymon.pj.StreamParser.parse(StreamParser.java:22)
at GetPDFInfo.main(GetPDFInfo.java:35)
-------------------------------------------------------------
while reading pdf file 2
com.etymon.pj.exception.PdfFormatException: Token "%!PS-AdobeFont-1.1:" not recognized.
at com.etymon.pj.StreamParser.processToken(StreamParser.java:958)
at com.etymon.pj.StreamParser.parse(StreamParser.java:22)
at GetPDFInfo.main(GetPDFInfo.java:35)
-------------------------------------------------------------
while reading pdf file 3
com.etymon.pj.exception.PdfFormatException: Token "??? ?Adobe d? ?? C " not
recognized.
at com.etymon.pj.StreamParser.processToken(StreamParser.java:958)
at com.etymon.pj.StreamParser.parse(StreamParser.java:22)
at GetPDFInfo.main(GetPDFInfo.java:101)

Can you please tell me, what is the reason for errors?
The error is caused because of following line ..
Vector myvec = sp.parse( myPjStream );
This line appears in this block of code ..
StreamParser sp = new StreamParser();
PjStream myPjStream = ( ( PjStream ) obj ).flateDecompress();
Vector myvec = sp.parse( myPjStream );
for ( int j = 0 ; j < myvec.size() ; j++ ) {
if ( myvec.get( j ) instanceof XTj ) {
PjString myPjString = ( ( XTj ) myvec.get( j ) ).getText();
System.out.println ( myPjString.getString() );
}
}

Regards,
Balbhadra
 
Ranch Hand
Posts: 427
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
http://www.pdfbox.org/
 
Ranch Hand
Posts: 299
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Take a look at iText: http://www.lowagie.com/iText/
it may do what you need and it is well-documented.
brian
 
Do you pee on your compost? Does this tiny ad?
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic