• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Tim Cooke
  • Devaka Cooray
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
Bartenders:
  • Carey Brown
  • Roland Mueller

PDFBox throws OutOfMemory error

 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi guys,
I am using PDFBox to parse sequentially a large number of PDF files, get their text, and write in another document. What I have done is creating a method


that accepts as arguments the name of the file and the Writer to the new file that contains the concatenation of the textual data.

The error I get, when the parsing reaches one specific file, is:





Any ideas what is going wrong?
 
Konstantinos Vasileiou
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
To add some more details to my previous post:
Increasing the heap size does not help at all - I tried up to 512Mb getting the same error at this particular pdf.
I made a simple prototype application like this:



and the system throws exactly the same error at this damned file.
 
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How big is the PDF?

I've never used the PDFBox, but I have used itext (http://www.lowagie.com/iText/) and that worked pretty well for me but I was generating PDFs and not reading them.
 
Konstantinos Vasileiou
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The PDF is not very big (1.8Mb) and PDFBox works fine with much larger files.
 
Jarred Olson
Ranch Hand
Posts: 37
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Again, I've never used PDFBox so I'm not sure if you can do this or not (I know you can do it with java.io.*) but you might want to try reading it in line by line to try and keep your heap size down.
 
Konstantinos Vasileiou
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jarred Olson wrote:Again, I've never used PDFBox so I'm not sure if you can do this or not (I know you can do it with java.io.*) but you might want to try reading it in line by line to try and keep your heap size down.


I am sure you can do this... at least I do not know such a method of PDFBox. The getText() method extracts all the text at once, but as I can guess from the description of the error message, PDFBox also uses the structure of the pdf document, so I do not know if parsing line by line can exist, similarly to a "flat" I/O stream.
 
I was her plaything! And so was this tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic