Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

OutOfMemoryError in processing large xlsx file (167 MB) using Apache POI

 
Harinath Br
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am using Apache POI version3.8 to process xlsx file (size-167 MB with around 7 lakh records). I want to upload this data to database.

I am getting java.lang.OutOfMemoryError at the below line.

Workbook workbook = WorkbookFactory.create(new File(fileDirectory, fileName));

Even if I use XSSFWorkbook also same issue...

XSSFWorkbook xssfWb = new XSSFWorkbook(opcPackage);

any one please help me in resolving this issue.

Thanks in advance.
Hari...
 
Paul Clapham
Sheriff
Posts: 21322
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The XLSX file takes up 167 megabytes of disk space? Then that's not surprising. Assign more memory to the application, as much as possible.
 
Harinath Br
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Paul for your quick reply.
I tried that but it didn't help. Currently I assigned 1024M, even I doubled it but no luck.
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
POI is notorious for needing lots of memory; it's quite possible that a 167MB file needs more than 2 GB.
 
Harinath Br
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dittmer,

I tried that but JDeveloper is not at all responding even for just java standalone program with System.out.println("");

I have 4 GB RAM.

Please suggest is there any way to stream the data partly instead of loading entire file using Workbook workbook = WorkbookFactory.create(new File(fileDirectory, fileName)); or XSSFWorkbook xssfWb = new XSSFWorkbook(opcPackage); bcz I am getting error at this line only.
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm fairly certain that POI can only deal with complete files; there's no way to make it read partial files (and thus lower the memory consumption). That line is probably where POI tries to read the entire file into memory.

(My first name is Ulf, by the way. It's fine to address me with my last name if you prefer to do so, but then the polite way to do so is by prefixing it with "Mr.".)
 
Paul Clapham
Sheriff
Posts: 21322
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Harinath Br wrote:Please suggest is there any way to stream the data partly instead of loading entire file using Workbook workbook = WorkbookFactory.create(new File(fileDirectory, fileName)); or XSSFWorkbook xssfWb = new XSSFWorkbook(opcPackage); bcz I am getting error at this line only.


No, that's not how the WorkbookFactory class works.

It might be possible to use other features of the POI project to provide a "streamed" version of the data, but I don't know that for sure. Somehow I doubt it, because I'm not sure what exactly would be streamed. It's easy for somebody who just thinks of Excel as a grid with cells which might contain data to ask for that, but for somebody who has to write code which can deal with all of the thousands of different kinds of objects which might be present in an Excel document, it isn't really all that simple.
 
Harinath Br
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Thanks everyone for the reply.

I have done in this way... I have split the file with 20k records per file (around 5MB) as workaround in this case.
 
Meer Nasirudeen
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello everyone... I too faced the same issue of OOM while parsing xlsx file... I somehow feel that below code helped me in a simpler way to handle..

This code is based on sjxlsx. It reads the xlsx and stores in a HSSF sheet.



 
Olivier Rihoux
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just try replacing :




by :




SXSSFWorkbook (Streaming version of XSSFWorkbook) will act the same way as XSSFWorkbook but keeps only 100 (in this case) rows in memory

 
Meer Nasirudeen
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Olivier Rihoux wrote:Just try replacing :




by :




SXSSFWorkbook (Streaming version of XSSFWorkbook) will act the same way as XSSFWorkbook but keeps only 100 (in this case) rows in memory



Yup. You are absolutely correct. However, the solution I have suggested is to get the entire worksheet from the high-size file. I already tried using SXSSF, but as you said, beyond last 100 records are getting retrieved while reading back the sheet
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic