• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Most efficient Java collections

 
Ranch Hand
Posts: 80
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a program that needs to be make many thousand database calls to see if entries exist. I would like to make one call, populate a key/value collection and was wondering what is the most efficient collection. I am looking at up to 30 - 40 thousand rows of string data. Any suggestions?
 
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You're doing that backwards. Instead of ranking the collections by some undefined "efficient" measure, the first thing you should do is to rank them in order of suitability for what you want to do. It doesn't do any good to be told that Collections.EMPTY_MAP is the most efficient collection (which it is -- low memory footprint, high speed of access) when it doesn't actually satisfy your requirements.

So let's do that. You want a key/value collection, so that's a Map. Looking at the API documentation, there's a long list of classes in the standard API which implement Map. Some of them obviously have special purposes (e.g. PrinterStateReasons) but if you eliminate them there's still about half a dozen to choose from.

At this point I'm out of information. You haven't said how you need to access the Map (at least not in detail), or whether you plan to access it concurrently from multiple threads, so you should clarify your requirements before continuing.
 
James Daniel
Ranch Hand
Posts: 80
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Paul,

Thanks for the reply. I would like to make a single database call and from the resultset put in the the PK as the key and Hudson api build url as value in a map. This will be 30 - 40 k rows. Then instead of going to the database to see if the value exists, I will check the map. If it doesn't exist, I will add it to the database.

Jim
 
Paul Clapham
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Then HashMap should be fine. Or ConcurrentHashMap if you're going to have multiple threads accessing it.
 
Ranch Hand
Posts: 37
Oracle Tomcat Server Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you are just checking one of those two value types (and I assume you are) you can use a HashSet to reduce your memory footprint while still providing fast lookups and comparisons like HashMap. If memory isn't a problem then it doesn't really matter.
 
James Daniel
Ranch Hand
Posts: 80
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Memory is always a concern. I just wanted to make sure that the map could handle thousands of elements and find out which one would be the fastest lookup. I will look at HashSet. Thanks both of you for your time..
 
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

James Daniel wrote:Memory is always a concern. . . .

Agree. I once managed to crash an ArrayList by adding 64000000 Strings to it, and I crashed an int[] by trying to declare it with size 2147483647.

Try something like this:On my PC I am getting over 8800000.
A large HashSet and a HashMap with the same number of pairs have a similar structure, and I would expect them to occupy virtually the same memory space.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Note that making a good guess on the initial capacity will help performance with collections.

For instance the default size for HashMap is 16 - every time the load factor is exceeded, it has to double the size and rehash the existing entries - potentially very time consuming. See the java.util.HashMap javadocs for discussion.

Bill
 
Campbell Ritchie
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, I should have guessed 9000000 for my Sets. A guess too high, however, would have taken me over my available memory even before the Set was filled.
 
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

William Brogden wrote:Note that making a good guess on the initial capacity will help performance with collections.


On the other hand, a guess too high is much, much worse than a guess too low. Firstly, the allocated capacity will not be freed during normal operations and won't be garbage collected (the other case - several reallocations - puts some strain on the GC, but does not hog the memory indefinitely). And secondly, in case of HashSet and HashMap, too high a capacity severely slows down iterations over these structures.

Some time ago I run a few tests to see how much overhead these reallocations incur. I was not able to reliably measure a difference. Of course, a collection with hundreds of thousands of entries deserves some thought, but for ordinary, day-to-day collections I've given up and now I don't specify the size upon creation (only when I'm making a copy of another collection I get the size of that collection as the capacity, here the guess is always exact). Keeping the guesses up-to-date is very tedious and I'm not convinced of any benefits. Remember, the capacity usually grows exponentially. It'll get pretty big really fast.
 
Martin Vashko
Sheriff
Posts: 3837
66
Netbeans IDE Oracle Firefox Browser
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Yes, I should have guessed 9000000 for my Sets. A guess too high, however, would have taken me over my available memory even before the Set was filled.


If you're dealing with a collection that might need close to half of the available memory or more, you need to get the capacity right at the creation and never let reallocation to occur. Such a collection cannot increase its capacity successfully at this size, since it will (by definition) try to allocate even more than it currently has, and half of available memory is already occupied. All collections reallocate by allocating a new structure, copying data from old structure to the new one and then releasing the old one.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic