Win a copy of Reactive Streams in Java: Concurrency with RxJava, Reactor, and Akka Streams this week in the Reactive Progamming forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Junilu Lacar
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Ganesh Patekar

Need to read large xls / xlsx file say greater than 1.5 mb

 
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I am using apache poi to read xls/xlsx file.
the code works well if the file size is say 1.20 mb containing 5000 rows and almost 50 columns.
but if the size is increased then i get heap memory error.

is there any way to read huge files say 2,3,5,10 mb etc in java?

also i explored another option available in apache poi which is using OPCPackage , using sax parser etc but did not fully understand the example available at below link

http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/eventusermodel/examples/FromHowTo.java

it would be of great help if somebody guides me in right direction.
This is really urgent for me and i did an honest R&D on web but did not get any appealing solution.

Thanks in advance!
 
Rancher
Posts: 43011
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You might start by giving the JVM more memory to work with. The default JVM memory allocation is rather small compared to the amount of RAM computers tend to have these days.

You're on he right track using the eventmodel API, both for XLS and XLSX, according to the table on http://poi.apache.org/spreadsheet/index.html. What specifically did or did you not understand about that example code? How have you tried to adapt it to your purposes, and what prevented you from doing so?
 
Indrayani Godbole
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I understood following method :


but what i did not understand is, how should i now read the data cell wise?
 
Ulf Dittmer
Rancher
Posts: 43011
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The SAX handler methods of the SheetHandler class will be called. When you ran this code, did you not see the output generated by the System.out.print/println calls?
 
Indrayani Godbole
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No i did not see any sysouts.
actually , one question that i have is, do i need to call the startelement and endelement methods explicitly??
posting the class that i am using as a test class.
Please let me know if i am missing anything here


 
Ulf Dittmer
Rancher
Posts: 43011
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

do i need to call the startelement and endelement methods explicitly?


No, that happens automatically - those methods are callbacks. You should familiarize yourself with SAX XML processing to understand how that works.

I'd try the test code myself, but it's missing the import statements, so it doesn't compile.
 
Indrayani Godbole
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ok.

posting the sample class with import statements :

 
Ulf Dittmer
Rancher
Posts: 43011
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Your spreadsheet may not have a sheet "rId1" - the sheet ID can apparently change through editing, so assuming any particular ID will be present is problematic. Try processAllSheets instead.

You may also need to replace XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser") by XMLReaderFactory.createXMLReader() - my JDK didn't have that particular parser.

And lastly, I had to put an empty try/catch block around lines 98 and 99. But then, it worked as expected, and I could see the cell contents as they were being parsed.
 
Indrayani Godbole
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thank you for the details
even i tried the processallsheets and it worked well by putting try catch at the suggested lines.

now , my next query is, this works fine for string values but what if the value is of type date.
how can i handle the date value.
when i check values for type as a mentioned below , i got it only for string type, which is s , for other column types such as date , number , i am getting NULL



here, i am getting s for celltype of type string and for other types, NULL
 
Ulf Dittmer
Rancher
Posts: 43011
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The easiest is probably to look at the XML file that's inside the XLSX file, and to see what the XML for those other cell types is. (XLSX is a ZIP file, so you would unzip it to get at its contents.)

Or you could check the XLSX spreadsheet file format specification, it should talk about all this in detail.
 
Indrayani Godbole
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ok, i checked the xml also, but it is not showing any datatype for date type values.
only for string values, it is showing attribute as

so,now, i am having the large file xls reader code, but not able to get cell value which is of type Date or number
 
Ulf Dittmer
Rancher
Posts: 43011
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hm. The file format specification would the ultimate arbiter in how to interpret that data: http://www.ecma-international.org/publications/standards/Ecma-376.htm
 
Indrayani Godbole
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ulf,
Thank you for the link, but what exactly should i check in the posted url.
I mean , i checked one of the zip files that are available on the url that you have posted, opened pdf file contained in the zip, but did not understand much like my question : how to identify whether the cell contains a date value or not : is still open
 
Ulf Dittmer
Rancher
Posts: 43011
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I can't help with that, I've never looked at the file spec in detail. But somewhere in there is the answer to your question.
 
Indrayani Godbole
Ranch Hand
Posts: 33
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
oh okk.
Thank you,i will check.
 
Does this tiny ad smell okay to you?
Java file APIs (DOC, XLS, PDF, and many more)
https://products.aspose.com/total/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!