• Post Reply Bookmark Topic Watch Topic
  • New Topic

hashmap vs searching

 
Puneet N Vyas
Ranch Hand
Posts: 61
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
we can do searching via hash map and by some searching api available,can any one tell which one is better in terms of time of execution or any other measure






thanks for support
 
Vilmantas Baranauskas
Ranch Hand
Posts: 89
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
HashMap is very fast for exact-match look-up. Of course, you need to consider size of your data.

To choose between HashMap and "some searching api available" you need to know (1) more about what you really need and (2) more about "some searching api available".
 
karthikeyan Chockalingam
Ranch Hand
Posts: 259
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can you please mention to what the search is related to ?
 
Puneet N Vyas
Ranch Hand
Posts: 61
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
say simple 100 integer values i have to search,which one will be best?
 
Vilmantas Baranauskas
Ranch Hand
Posts: 89
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you have 100 integer values and define your search as "do I have value X" then following would work:

 
Joe Ess
Bartender
Posts: 9362
11
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Vilmantas Baranauskas:
. . .then following would work:


"would work", yes, but OP is asking which is better.
My gut reaction is that a hash map would be faster for simple cases like your example. I would bet there are other cases where maintaining a map is impractical due to memory constraints, hash collisions and so on and a good search algorithm would perform better.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 35742
412
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would expect it to depend on how many times you call contains(), how many hash conflicts there are (for integers this should be low) and how big the dataset is. For 100 items, it is unlikely to matter either way.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you do any measurements with real data, kindly let us know how it turns out.

In addition to the factors others have mentioned, I would consider how often the data changes. For a large set of unchanging strings there are some reallllly fast algorithms as used in spell checkers. Naturally if the dataset changes alot the cost of building the spell checker data structure is not worth the fast individual lookups.

Bill
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For a limited range of integer values, a boolean array would be another option, and probably even faster. Hard to tell whether that would worth it without knowing more. When in doubt, I would always start with a HashSet and use a profiler to find the bottleneck if that's not fast enough.
 
Vilmantas Baranauskas
Ranch Hand
Posts: 89
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Some numbers.

Pentium M 2GHz, Java 1.6 on linux 2.6.24.

1.000.000 random integers are put into HashSet. Takes 3991ms.

10.000 look ups of random integers takes 24ms.
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by fundupuneet vyas:
say simple 100 integer values i have to search,which one will be best?


If you have 100 values, it doesn't matter.
Premature optimization.

I'd use a HashMap just because its easy. Or if I knew more about the domain, I might use a self optimizing table.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If the integers are in a reasonable range, say 0 to 1000000 you could use a boolean array.

boolean[] flag = new boolean[1000000];

then to test if N is in the array, just look at flag[N]

Bill
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by William Brogden:
If the integers are in a reasonable range, say 0 to 1000000 you could use a boolean array.

boolean[] flag = new boolean[1000000];

then to test if N is in the array, just look at flag[N]

Bill


Bill, thanks for making my suggestion more concrete...
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My experience is that a byte[] array typically uses one full byte for every boolean. Thus, it's eight times as big as it needs to be. This is not specified anywhere that I'm aware of, and it's possible that on some platforms, a boolean[] array uses only one bit per boolean. It's also possible that some platforms use more than one byte per boolean. Anyway though, if you want to use space a little more effectively, a java.util.BitSet may be a good idea. This generally uses one bit per boolean, if you get the size right in the first place and don't need it to resize. It's probably a little slower than a byte[] array - perhaps a lot slower, if you aren't doing anything else in your program. Usually that doesn't matter much though.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!