Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to manage a very very large list in memory?

 
S Dan
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Let consider the following line

List<DataType> result = Compute();

result is a list of type DataType and could be very large.
I need to keep this in memory as I'm using this list in the
other parts of the code and iterating over it. My application
could create a very large list at times and causes out of
memory exception. Setting max heap size using -Xmx will not
help in extreme cases. Basically I need to come up
with a smart scheme to manage the list that causes minimum
refactoring in my code. Any suggestions? Thanks.
-Dan
 
Abhinav Srivastava
Ranch Hand
Posts: 354
Eclipse IDE Java Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you can provide your own implementation of List, you can implement lazy-loading. Also, if this list is coming from a persistent store, you can use a scrollable cursor.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
[S Dan]: I need to keep this in memory as I'm using this list in the
other parts of the code and iterating over it.


Yet later you establish that you really can't keep the whole thing in memory.

It sounds like lazy loading isn't quite enough - the list also needs to be able to unload somehow, too. Otherwise the first time you iterate through the list, you'll fill the list and run out of memory. You could use a List<WeakReference<DataType>> internally I suppose, and reload as necessary.

I would probably start by implementing just an Iterator that loads from the database. If you're using JDK 5+, make an Iterable<DataType> for ease of use. Otherwise you can make a List but try making many of the oterations throw UnsupportedOperation - you may find that most of the clients who need the List just need it for iteration. Here I'm assuming that you can test the whole application at once before releasing your solution publicly. (Or use a usage search, available from any decent IDE.) If your clients are far away and you don't know just what they're doing with your List, you may upset people if you start throwing exceptions from previously-working methods.

If the database is slow, as is often the case, you may get better results by using seralization and writing the records to local files, then reading them. This is especially true if the database is across a slow network. Of course if users modify the list or its contents, that probably means you need to modify the database as well, along with any local files. This can get messy quickly. In that case using a scrollable cursor in the database may be your best bet, forgetting about the local files.

A more complex List implementation might initially just use an ArrayList to keep everything in memory. But when the size exceeds some limit, it would switch to a DB-backed or file-backed solution along the lines above. Then you can still get fast performance when the list is small enough, but you can handle larger data sets when necessary.
[ January 16, 2008: Message edited by: Jim Yingst ]
 
luc peuvrier
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
To solve the problem to manage huge collection without using database I use joafip
 
Charles Lyons
Author
Ranch Hand
Posts: 836
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could try a RandomAccessFile if you develop a sensible storage format for your objects. Then you just need to seek to the correct place and decode the information into a temporary array. All this could be done in a List implementation to preserve compatibility with existing code - you might want to be clever too and cache a sensible number of entries into a local array for fast access (since disk access is slow). This basically is creating your own swap space on disk - you'll easily be able to store GBs of data in the file (on 64bit TBs in fact).
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24212
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by luc peuvrier:
To solve the problem to manage huge collection without using database I use joafip


Well, yes, that's not surprising, considering you're the project lead.
 
Darius Cooper
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by S Dan:
List<DataType> result = Compute();

Do you have control on the design of the "DataType" class? If so, is each object of that class potentially large? If so, are there some things you can do to make each DataType object small (at least on average)? If not, do you only need a certain subset of DataType in the list? If so, perhaps you can use a different class-design.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13071
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Does this list get modified after the initial creation?

Bill
 
Charles Lyons
Author
Ranch Hand
Posts: 836
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Replying to myself
You could try a RandomAccessFile if you develop a sensible storage format for your objects.
This would be best if DataType is Externalizable and the objects are stored back-to-back - then you know how large each one is since you write the storage format (making each serialization the same size would be ideal for seek purposes). That is fine if you can afford the overheads of deserializing and slow disk access times. Though I can't really see a way round this - your storage is either in memory or on disk if you really can't optimise your code.
 
luc peuvrier
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ernest Friedman-Hill:


Well, yes, that's not surprising, considering you're the project lead.


I spoke about joafip in this forum since I had to solve the problems of hug data model that can not be store in memory. It is based on RandomAccessFile with a Heap in file management layer, this may be enought to store variables size object in file. Joafip use an other upper layer that visit object graph for saving and proxy for lazy load.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic