• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

how to reduce runtime size of HashMap

 
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

I have a use case where i am pushing tens of thousands of data into a HashMap.

the structure of the HashMap is like
HashMap<String1, Map<String2, Map<String3, String4>>>

Here the String1 would anyway be unique, but the String3 and String 4 are very frequently repeating Strings.
These are representing the Status and Priority values in each "calendar week" (String 2)

The String3 (Status) and String4 (Priority) values can only be withing a predefined set of ten to 15 strings.

Now when i use the below code for serializing the hash map,
----------------------------------------------------------------------------------------------------------------
File file=new File("C:\\testData1.ser");
FileOutputStream fos=new FileOutputStream(file);
ObjectOutputStream oos = new ObjectOutputStream(new DeflaterOutputStream(fos));
oos.writeObject(MasterChartingData);
oos.flush();
oos.close();
fos.close();

----------------------------------------------------------------------------------------------------------------
and the below code for deserializing,
----------------------------------------------------------------------------------------------------------------
File file=new File("C:\\testData1.ser");
FileInputStream fis=new FileInputStream(file);
ObjectInputStream ois = new ObjectInputStream(new InflaterInputStream(fis));
HashMap DeserializedMasterChartingData=(HashMap<String, Map<String,Map<String,String>>>)ois.readObject();
ois.close();
fis.close();

----------------------------------------------------------------------------------------------------------------

The repetition of the strings are identified and removed and the storage file size reduces to ~600 KB, where as a normal serialization, without using deflator/inflator, would create a file of ~6MB.

But the problem here is, when the number of entries increases, the Java run time is not able to handle the growing size of the HashMap during runtime.
Is there any effecient ways where, right at the time of constructing the hashmap itself, to identify the data repetition and avoid it, and the HashMap construction is done memory effeciently?

regards
mad
 
Master Rancher
Posts: 4806
72
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
After you obtain a String3 or String4 value from... wherever you get them from, try calling intern() on the string, and use the value returned by intern() instead of the original value:

Or equivalently:

Don't do this for any string that you expect may have many many different values, as it could cause problems - it's hard to garbage collect these strings once they're interned. But for strings that you know will be confined to a small finite set of values that you can afford to keep in memory for the life of the program, it's fine, and should accomplish exactly what you need.

You could also encode the String3 and String4 data in various other ways. Perhaps each could be represented by an enum, and you use the enum's valueOf() method to look up the enum value for a given string. But I expect that will give you almost exactly the same memory usage as using intern() will.

There may be more compact ways to encode the hashmaps, especially the last one. But I doubt the saving will be worth the complexity. The intern() method will save you much more memory than any subsequent encoding tricks. Probably.
 
Bartender
Posts: 6663
5
MyEclipse IDE Firefox Browser Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Madan, please avoid creating duplicate posts. You can find my reply to your query here -> https://coderanch.com/forums/posts/list/531720#2411074

Its pretty much what Mike suggested in the latter part of his post.
 
Madhan B Babu
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

HashMap<String[1], Map<String[2], Map<String[3], String[4]>>

It is to keep track of the Status and Priority values associated with an object represented by String[1] on each day.
Hence the String[1] is unique and the other strings are repeating values.

String[2] will have values related to the day of the year
String[3] will have values either "Status" or "Priority"
String[4] will have any one values from {Major, Critical, Minor} or {Open, Resolved, Closed}

regards
mad
 
Madhan B Babu
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

Thanks for the response.

I am now able to achieve the HashMap creation of size more than 100000 key value pairs, and also able to successfully serialize it using DeflatorOutputStream and the file size is ~2.5 MB.

But i am getting an OutOfMemory exception when i deserialize the map back into the JRE, using an InflaterInputStream.

Since i am storing it as a single whole object, the oos.readObject builds the whole object back in run time, which does'nt obviously use the intern() for the HashMap construction.

regards
mad
 
Mike Simmons
Master Rancher
Posts: 4806
72
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, from what you've now told us, all the data from String3 and String4 could be easily compressed into about 4 bits for each calendar week. But we probably don't need to be that extreme. Just use something like this:

Then your original

Map<String[1], Map<String[2], Map><String[3], String[4]>>

becomes

Map<String[1], Map<String[2], CalendarWeekData>>
 
Madhan B Babu
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Mike

I had tried to have it in enum and build the Map with the object, but it results in a OutofMemory Exception, and that is when i switched over to multi-level Map with only Strings as Key and Values, and with the usage of .intern() , i am able to put more than 1 million entries into the Map.
To be very specific, there was a PermGen out of space exception , but only after 1.2 million entries....

Now the problem is with the deserialization, where it tries to deflate the whole Map and results in an OutOfMemory exception.

 
Mike Simmons
Master Rancher
Posts: 4806
72
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, I'd be interested in seeing exactly what you tried with the enums, because it really seems like that should work.

There are still many things to try, but it's hard to predict which will work best.

One possibility is, instead of writing the entire base HashMap to the file at once, write individual Map.Entry objects, one at a time. When you deserialize, create a new HashMap from scratch, and then read one Map.Entry at a time, and put its key and value into the HashMap. Perhaps breaking the process up this way will allow garbage collection to work more effectively.

If that doesn't work, you could break things up further by putting, say, 10000 Map.Entry objects in one file (using one ObjectOutputStream). Then do the same for the next 10000 entries, using a new OOS, and a new file. Repeat until all entries have been written. To read, reverse the process.

You also might add a readResolve() method (described in the Serializable API). To do this, you need a custom class to hold things in, like my CalendarWeekData. Maybe something like this (modified to use Strings rather than enums, since those were more successful so far). There are many ways to do this, but as long as we're doing it, we might as well limit the number of CalendarWeekData objects too. After all, there are only 9 different combinations of 3 different status strings with 3 different priority strings - so 9 different CalendarWeekData objects should be sufficient. (Here the CalendarWeekData objects should be immutable, to ensure they can be safely re-used.
reply
    Bookmark Topic Watch Topic
  • New Topic