• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Problem with handling huge data structures

 
Ranch Hand
Posts: 368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

I have a requirement wherein my java program has to deal with huge data structures which sometimes exceed the available memory to the java process. I have set the heap size to the max permitted by the OS but still it is not enough.
To solve this I am thinking on the lines of having the data-structure overflow to the file-system. But considering the time we have to get this done, is there an already available solution which tackles such scenarios. Something of the sort of a disk based cache etc. Please help.

Regards
 
Ranch Hand
Posts: 32
Eclipse IDE Objective C Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
consider using a in memory database , I remember hsql being used for one of the projects
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A lot would depend on exactly how your program uses these "huge data structures" - random access? sequential access? - does your program have to modify and save these structures?

Explain in more detail please if you want better advice.

Are we talking about Java objects here? If so can you create a more compact representation based on the real characteristics of the data items?

Are you on a network where some work can be farmed out to another machine?

Bill
 
Satya Maheshwari
Ranch Hand
Posts: 368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

William Brogden wrote:A lot would depend on exactly how your program uses these "huge data structures" - random access? sequential access? - does your program have to modify and save these structures?

Explain in more detail please if you want better advice.

Are we talking about Java objects here? If so can you create a more compact representation based on the real characteristics of the data items?

Are you on a network where some work can be farmed out to another machine?

Bill



Ok, let me explain the problem more specifically.
Basically I have a huge hash map with string to string mappings. This hashmap is used for making string to string replacements in several files. So its is a random access. Once this map is built up, it is only accessed and not modified.

Yes, putting this on another machine in the network is an option(similar to putting it on a file) though it may reduce the performance. Is using a LinkedHashMap as an LRU cache a good idea?
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Obviously the very best performance would be an all in memory HashMap.

Assuming that is impossible, lets consider some more questions.

Can lookups be batched? If you are going to have to go to some sort of file or network lookup, the bigger the batch you can create the better.

More questions:

Do these strings have to be UNICODE or are the characters in the ASCII set?

Do you have to check every single word for a possible substitution?

What is the likely frequency of having to replace a string? It may make a big difference if there are only a few per document versus wholesale replacement.

I can't see how a linked hash map would help, but a cache is a good idea since words tend to repeat in documents. Note that I am assuming typical text documents because thats the kind of thing I used to work with.

Bill

 
Rancher
Posts: 13459
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Are you using substring on file contents? That method can cause memory leaks since you think you're using part of the file but the immutability of Strings means you retain the whole file in memory. Maybe not, but then again maybe you're not using as much memory as you think.
 
Satya Maheshwari
Ranch Hand
Posts: 368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

William Brogden wrote:Obviously the very best performance would be an all in memory HashMap.

Assuming that is impossible, lets consider some more questions.

Can lookups be batched? If you are going to have to go to some sort of file or network lookup, the bigger the batch you can create the better.



Yes, I think batching them would save on the n/w lookup. Actually I have been thinking on parallel lines about storing this in the filesystem, as you suggested for network. And then maintaining and LRU cache which keeps the most recently used stuff online in the LinkedHashMap. LinkedHashMap because it has an inherent support to act like an LRU cache.

William Brogden wrote:
More questions:

Do these strings have to be UNICODE or are the characters in the ASCII set?


ASCII. In fact all these strings are numbers.

William Brogden wrote:
Do you have to check every single word for a possible substitution?


I think so. I do pattern matching to find out a number and then look for it in the map. Is there any other way?

William Brogden wrote:
What is the likely frequency of having to replace a string? It may make a big difference if there are only a few per document versus wholesale replacement.


Frequency varies but I can predict beforehand it as it depends of the type of document.

William Brogden wrote:
I can't see how a linked hash map would help, but a cache is a good idea since words tend to repeat in documents. Note that I am assuming typical text documents because thats the kind of thing I used to work with.



LinkedHashMap because it can easily be extended to an LRU cache. These are not typical text documents but mostly XMLs and a few FLVs.


 
Satya Maheshwari
Ranch Hand
Posts: 368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

David O'Meara wrote:Are you using substring on file contents? That method can cause memory leaks since you think you're using part of the file but the immutability of Strings means you retain the whole file in memory. Maybe not, but then again maybe you're not using as much memory as you think.



No I am not using substrings on the file contents. But yes, when you talked about memory leaks due to string immutability, I am thinking on reviewing the code to figure out any such culprits.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
With all ASCII replacement text you could save considerable memory in a HashMap by storing replacement text as byte[] not String, String uses 16bit unicode.

IF the input numbers and replacement numbers fit in Java Integer or Long you could save even more and maybe keep the whole HashMap in memory. So - what is the range of these numbers?

Bill

 
Satya Maheshwari
Ranch Hand
Posts: 368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

William Brogden wrote:With all ASCII replacement text you could save considerable memory in a HashMap by storing replacement text as byte[] not String, String uses 16bit unicode.

IF the input numbers and replacement numbers fit in Java Integer or Long you could save even more and maybe keep the whole HashMap in memory. So - what is the range of these numbers?

Bill



Thanks for the information. Yet the number would fit in Long though not in Integer.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic