Jarred Olson wrote:Again, I've never used PDFBox so I'm not sure if you can do this or not (I know you can do it with java.io.*) but you might want to try reading it in line by line to try and keep your heap size down.
I am sure you can do this... at least I do not know such a method of PDFBox. The getText() method extracts all the text at once, but as I can guess from the description of the error message, PDFBox also uses the structure of the pdf document, so I do not know if parsing line by line can exist, similarly to a "flat" I/O stream.