Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Big Collection, integer key

 
Siamak Saarmann
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I have a very big unordered collection (10,000-50,000) and I need a very fast search method using their int id. Is a hashtable (with Integer key) the best method to use? Or there is a better alternative?

Thanks, Mac
[ November 06, 2008: Message edited by: Siamak Saarmann ]
 
Campbell Ritchie
Sheriff
Posts: 51453
87
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why not use a linear search? It shouldn't take more than a few milliseconds. 5 digits is by no means a large collection.
 
Dmitri Bichko
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, the question is "I need a very fast search method" and the first answer is "linear search"? That's... interesting.

Yes, a hash lookup is one of the fastest ways to do this (without knowing more about the specific problem). There are also some faster (and more memory efficient) implementations out there; for example, Colt: http://acs.lbl.gov/~hoschek/colt/api/cern/colt/map/package-summary.html#package_description
 
arulk pillai
Author
Ranch Hand
Posts: 3393
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is not really a big collection and you can experiment it with


-- int pos = Collections.binarySearch(list, key); check http://java.sun.com/j2se/1.5.0/docs/api/java/util/Collections.html#binarySearch(java.util.List,%20T)

-- HashMap or HashTable (key is id Object, value is an object)

-- Hashset (value is object, but your equals and hashCode method should only use id attribute. It will break if you use additional attributes). This is a hack way and not recommended.
 
Campbell Ritchie
Sheriff
Posts: 51453
87
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But linear search at O(n) will be faster than sorting at O(nlogn).
Ar Arulk Pilai says, experiment.
 
fred rosenberger
lowercase baba
Bartender
Posts: 12266
36
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
is the collection going to grow at some point? will it eventually become a million or a billion or more elements? How often are elements added or deleted?

Also, how often do you need to do this? if you are going to search for an element millions of times, you may do better to take the performance hit and sort it once (high initial cost) so that all the later searches are quick (low subsequent cost).

However, if you are only going to search for a dozen or so items, it's not worth sorting.

How is the collection built? is it rebuilt on every run? can you build it once, sort it, and save that for subsequent use?

How you answer each of these questions can make a difference in what you do. There is seldom a 'best' way to do something in software design. You have to weigh all the variables and decide what is most important... and then go from there.
 
Siamak Saarmann
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello again,

Actually the collection is the list of agents in a simulation software. In each time step each agent will perform its actions, however it needs to look at several other agents (at least 5) to be able to decide. So I need 50,000 * 5 searches for each time step.

Considering that I need to simulate hundred thousands of time steps I will need 100,000 * 50,000 * 5 searches in each simulation.

@Dmitri : I already use colt for random numbers, thank you for mentioning the capability of Colt. I never looked at other features of it.

@campbell and arulk: thanks for suggestions. I guess experimenting will give me the best choice.

@fred : Agents come and go (added and removed) and the size of the collection is not fixed (a very dynamic system is being simulated).

Regards, Mac
 
Siamak Saarmann
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just wanted to give a feedback on this.

For the simplicity I used HashTable with String keys (integer agent id's are converted to string and used as key) and even though this is not a professional work (I should rather use Integer keys) I was able to get from 8 to 80 times (depending on the number of agents in my collection) speedup.

It means a simulation which takes 80 seconds with linear search will only need 10 seconds or less now.

I was thinking if I work on the performance (in every section of the 20,000-30,000 lines of code software) I can make huge difference.
 
David O'Meara
Rancher
Posts: 13459
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I wrote this on HashMap behaviour a while back and should contain the data you want. It uses String keys rather than Integers, but given immutable keys with cached hash values, the insert, rehash and read times are pretty darn good.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic