Forum:

Performance

Hashtable vs Hashmap

town drunk
( and author)

Posts: 4118

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I'm seeing some strange behavior, and I was wondering if anyone had any insights.
As a test, I looping through, inserting, and retrieving 10,000 elements into a Hashtable and through a HashMap. Comparing the speeds for eac,
I'm finding that the Hashtable activities are actually faster then the HashMap activities. Now, that doesn't make sense, because Hashtables are synchronized, and HashMaps are not. Has anyone run into this?
M, author
The Sun Certified Java Developer Exam with J2SE 1.4

Java Regular Expressions

Ron Newman

Ranch Hand

Posts: 1056

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

That is a surprising result. Could you post the code, so others can check it out on their JVMs and report back?

Ron Newman - SCJP 1.2 (100%, 7 August 2002)

Thomas Paul

mister krabs

Posts: 13974

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

That doesn't jive with my own test results as published here:
http://www.javaranch.com/newsletter/Aug2002/newsletteraug2002.jsp#collections

Associate Instructor - Hofstra University
Amazon Top 750 reviewer - Blog - Unresolved References - Book Review Blog

Max Habibi

town drunk
( and author)

Posts: 4118

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I spoke to John Zukowski about this, and he suggested that it might have something to do with initial size or hotspot.
To that end, I'm running each test in it's own VM: I haven't addressed the initial size theory yet, I just wanted some feedback. I'm also using a TreeMap as a control, and it is functioning as expected.
These tests are on jdk 1.4.1, MS 2000 professional.
HashtableTest

HashMapTest

TreeMapTest

Thanks,
M, author
The Sun Certified Java Developer Exam with J2SE 1.4

Java Regular Expressions

Thomas Paul

mister krabs

Posts: 13974

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Also running on Win 2000 pro...
I'm getting the same results. The Hashtable always performs better than HashMap even at different trial sizes. Even stranger, adding an initial capacity and load factor makes the performamce get worse, not better.

Associate Instructor - Hofstra University
Amazon Top 750 reviewer - Blog - Unresolved References - Book Review Blog

Max Habibi

town drunk
( and author)

Posts: 4118

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hi Paul,
This is pretty weird: I'm inclined to think it's a 1.4 thing: I'm getting the expected results with 1.2. Are you also?
M, author
The Sun Certified Java Developer Exam with J2SE 1.4

Java Regular Expressions

David Weitzman

Ranch Hand

Posts: 1365

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

You might try running this program:

public class MapTest{
    static int sizeOfTrial = 100000;
    static String[] keys = new String[sizeOfTrial];
    static String[] vals = new String[sizeOfTrial];
    
    public static void main(String[] args) {
        //init sizeOfTrial key/value pairs
        for (int i=0; i < sizeOfTrial; i++){
          String s1 = "key"+ i;
          String s2 = "val"+ i;
          keys[i] = s1;
          vals[i] = s2;
        }
        test(new TreeMap(), "TreeMap");
        test(new Hashtable(), "Hashtable");
        test(new HashMap(), "HashMap");
        test(new Hashtable(200000), "Hashtable presized");
        test(new HashMap(200000), "HashMap presized");
    }
    
  public static void test(Map tm, String name){
    long t1 = System.currentTimeMillis();
    for (int i=0; i < sizeOfTrial; i++){
      tm.put(keys[i],vals[i]);
    }
    for (int i=0; i < sizeOfTrial; i++){
      tm.get(keys[i]);
    }
    long t2 = System.currentTimeMillis();
    System.out.println("total time for " + name + ": " + (t2-t1));
  }
}

When I run this I get these results:

If you compare the 1.4.1 sourcecode for HashMap and Hashtable, certainly HashMap requires more calls to package accessible functions to do any given action -- but I don't have a copy of 1.2/1.3 source to look at.

Thomas Paul

mister krabs

Posts: 13974

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

1.4.1 on Win 2K pro:
total time for TreeMap: 422
total time for Hashtable: 156
total time for HashMap: 312
total time for Hashtable presized: 109
total time for HashMap presized: 172

Associate Instructor - Hofstra University
Amazon Top 750 reviewer - Blog - Unresolved References - Book Review Blog

Max Habibi

town drunk
( and author)

Posts: 4118

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Originally posted by Thomas Paul:
1.4.1 on Win 2K pro:
total time for TreeMap: 422
total time for Hashtable: 156
total time for HashMap: 312
total time for Hashtable presized: 109
total time for HashMap presized: 172

Interesting. I got the following.
total time for TreeMap: 801
total time for Hashtable: 301
total time for HashMap: 1081
total time for Hashtable presized: 311
total time for HashMap presized: 250
I'm not really sure what to make of all of this: I simply copied in your code. It seems that in this case, my Hashtable is slower, while your Hashtable is faster.
Curiouser and Curiouser.
M, author
The Sun Certified Java Developer Exam with J2SE 1.4
[ October 14, 2002: Message edited by: Max Habibi ]

Java Regular Expressions

Sri Addanki

Ranch Hand

Posts: 195

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hi Max,
Interesting. I got the following results, each time i compile and run the program:
total time for TreeMap: 901
total time for Hashtable: 340
total time for HashMap: 822
total time for Hashtable presized: 240
total time for HashMap presized: 290
total time for TreeMap: 881
total time for Hashtable: 331
total time for HashMap: 811
total time for Hashtable presized: 190
total time for HashMap presized: 290
total time for TreeMap: 872
total time for Hashtable: 320
total time for HashMap: 811
total time for Hashtable presized: 190
total time for HashMap presized: 301
total time for TreeMap: 881
total time for Hashtable: 321
total time for HashMap: 811
total time for Hashtable presized: 190
total time for HashMap presized: 291
Thanks,
sri

Sri Addanki

Ranch Hand

Posts: 195

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hi Max,
I'm sorry...this is in continuation of the above post...i forgot to tell, that i run the program on j2sdk1.4.1
thanks,
sri

Ron Newman

Ranch Hand

Posts: 1056

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

On MacOS X 10.2.1 with Java 1.3.1:
total time for TreeMap: 1772
total time for Hashtable: 425
total time for HashMap: 346
total time for Hashtable presized: 640
total time for HashMap presized: 274

Ron Newman - SCJP 1.2 (100%, 7 August 2002)

sandeep menon

Greenhorn

Posts: 3

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

on win2k with 1.3.1
I think the differences might be due to different rehashing algorithms being used in Hashtable and HashMap.
Giving an initial capacity in my test significantly reduced the time and in all cases HashMap was a little faster.
Hashtable - no initial capacity - 220ms
Hashtable - initial capacity 150000 - 70ms
HashMap - no initial capacity - 210ms
HashMap - initial capacity 150000 - 60ms
These are average times over several runs.

Guy Allard

Ranch Hand

Posts: 776

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hi - A Linux box, PIII 600:
1.3.1
total time for TreeMap: 1882
total time for Hashtable: 445
total time for HashMap: 309
total time for Hashtable presized: 714
total time for HashMap presized: 174
1.4.0
total time for TreeMap: 1086
total time for Hashtable: 385
total time for HashMap: 920
total time for Hashtable presized: 212
total time for HashMap presized: 362
Guy

Ron Newman

Ranch Hand

Posts: 1056

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Sure looks to me like Sun broke something on the way from 1.3.1 to 1.4 .

Ron Newman - SCJP 1.2 (100%, 7 August 2002)

David Weitzman

Ranch Hand

Posts: 1365

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

It seems an pretty old bug report on the topic was ignored:
http://developer.java.sun.com/developer/bugParade/bugs/4491461.html

Sayed Ibrahim Hashimi

Ranch Hand

Posts: 148

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

WinXP P4 2.26 1.4.0
total time for TreeMap: 313
total time for Hashtable: 94
total time for HashMap: 234
total time for Hashtable presized: 47
total time for HashMap presized: 94

SCJP 1.4<br /><a href="http://www.cise.ufl.edu/~sih" target="_blank" rel="nofollow">www.cise.ufl.edu/~sih</a>

Jim Yingst

Wanderer

Posts: 18671

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

It seems an pretty old bug report on the topic was ignored:
http://developer.java.sun.com/developer/bugParade/bugs/4491461.html
It's hard for us to see how relevant that example really is, since we can't see the test code used. The claim that the case is unduly weighted to iteration could be valid - HashMap and Hashtable aren't primarily used for iteration anyway, so a claim that iteration is too slow can be justly ignored if the new code gives better performance elsewhere. Unforturnately the test run here seem to indicate this is not the case - but Sun may not realize it yet.
Another (perhaps more relevant) bug report is at:
http://developer.java.sun.com/developer/bugParade/bugs/4669519.html
The discussion there is interesting, but it sounds like they believe the problem has been solved as of 1.4.1. Apparently that's not entirely true. Someone may wish to submit a new bug report with the test code - I believe this is more likely to be read and addressed than are comments appended to an already-closed bug report. Of course, considering the news about layoffs at Sun, this may not be their highest priority right now; dunno.
[ October 19, 2002: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister

David Weitzman

Ranch Hand

Posts: 1365

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I submitted a bug report already, although I was a bit hasty and didn't say everything I should have (like why HashMap is expected to be faster than Hashtable in the first place). And all I mentioned was comparative speed. But if you all submit duplicates, maybe they'll get the message

Jim Yingst

Wanderer

Posts: 18671

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I saw your message appended to the old report, but didn't find a new report from you. Maybe it's still being processed. When it's visible on the system, post a link here and we can all vote for it.

"I'm not back." - Bill Harding, Twister

David Weitzman

Ranch Hand

Posts: 1365

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

They say it'll take up to three weeks to process.

JohnDoe DoeJohn

Greenhorn

Posts: 5

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Just a thought, you could use the same one file to do both tests, by commenting out one or the other but using a Map instead:

Always try to use the generic interface where possible, such as using List instead of Vector, ArrayList or LinkedList. In some cases, some api methods require Vector, for example, so you have to use a Vector. You can still use a List then just typecast to a Vector to maintain the flexibility of being able to use ArrayList, Vector or LinkedList "should" the API later change to properly accept the generic interface.

Thomas Paul

mister krabs

Posts: 13974

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

For performance testing it's best not to use the interfaces because you add a performance hit for the polymorphic behavior.

Associate Instructor - Hofstra University
Amazon Top 750 reviewer - Blog - Unresolved References - Book Review Blog

Ron Newman

Ranch Hand

Posts: 1056

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

But you're going to take that hit anyway, because neither HashMap nor Hashtable is final.

Ron Newman - SCJP 1.2 (100%, 7 August 2002)

Ilja Preuss

author

Posts: 14112

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Originally posted by Ron Newman:
But you're going to take that hit anyway, because neither HashMap nor Hashtable is final.

The modern Hotspot Engine is intelligent enough to see that there are no subclasses of HashMap/Hashtable loaded, so that it doesn't have to take polymorphism into account. Cool, eh?

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus

David Weitzman

Ranch Hand

Posts: 1365

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Originally posted by David Weitzman:
They say it'll take up to three weeks to process.

Still no word.

Tom Hughes

Ranch Hand

Posts: 86

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Someone once told me that hashmap changed from using an open hashing (1.3) to a closed hashing strategy (1.4) - or maybe vice versa, my memory is a bit hazy !
Could this explain the change in performance ?
T.

David Weitzman

Ranch Hand

Posts: 1365

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hmmm. I think I've lost a link I had earlier. Anway, check out http://www.javaspecialists.co.za/archive/Issue054.html and http://www.javaspecialists.co.za/archive/Issue054b.html

Jim Yingst

Wanderer

Posts: 18671

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Very informative; thanks for posting that. So if I understand correctly:
1. For some reason, some performance optimizations for doing / and % with bit-shifting and masking were lost in later JDKs. This made / and % slower than they had been.
2. In response they made hashtable bucket sizes powers of 2, which makes it really easy to perform % by masking, which is very fast.
3. Unfortunately if the bucket size is a power of two, you can get really crappy performance if the hashCode() method doesn't have a really good distribution for the last few bits.
4. It turns out many existing classes had inferior hashCode() methods in this respect.
5. So to protect against really abysmal performance in many common worst-case scenarios, they wrote a rehashing function which imposes a slight performance against all hashCode(). So now the best hashCode() methods are a little slower, but the worst hashCode() methods are only a little slower, rather than a LOT slower.
Which I guess makes some sense, in a twisted sort of way. But it seems as though if they just fix the damn % operation in the first place, they could go back to the earlier code, right? Wouldn't that be preferable? Would the rehashing method be offering any significant benefit if the bucket size weren't a power of two? Any ideas?
BTW David, I still couldn't find your bug report. Did you ever hear anything more about that? I have a free vote set aside for you.

"I'm not back." - Bill Harding, Twister

David Weitzman

Ranch Hand

Posts: 1365

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

But it seems as though if they just fix the damn % operation in the first place, they could go back to the earlier code, right? Wouldn't that be preferable?
Maybe we're both missing something, but I also think that rewriting HashMap is a pretty strange way to fix a mistake in the JVM.
David, I still couldn't find your bug report. Did you ever hear anything more about that?
Nope. Not a word.

Jim Yingst

Wanderer

Posts: 18671

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Maybe someone is trying to bury issues that embarrass them.

I suppose we could write to Josh Bloch and try to pester him about it, but that might distract him from making sure all our favorite new features make it into 1.5.

Don't want that to happen.
[ February 20, 2003: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister

David Weitzman

Ranch Hand

Posts: 1365

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

By any chance did you ignore the second link I posted? (So did I

!). It seems that the % problem is only for constant values.
That would suggest that better worst-case performance at the cost of average-case performance may be the only reason for the change in algorithm.
[ February 20, 2003: Message edited by: David Weitzman ]

Jim Yingst

Wanderer

Posts: 18671

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hey, I read it.

You're right that the division with constants problem may in fact be irrelevant here. When I first read it I was imagining that a JIT might consider the infrequently-changing bucket size to be "close enough" to being constant, that it was able to perform the optimization anyway. But now as I consider this further, it seems less likely. I know a JIT pays attention to which sections of code are being run frequently, to figure out which should be optimized; I have a harder time imagining the JIT also paying attention to the value of frequently-used variables within that code, and deciding whether they are sufficiently "close" to a constant to optimize (while also inserting checks to see when the "constant" changes and thus must be re-optimized. I can sorta imagine ways this could be done, but it seems like a lot of work to go to for dubious results; I doubt it's something that was in earlier JDKs and subsequently dropped. Rather it seems like something that would've been avoided in earlier versions until they'd been able spent enough time on it to verify that it produed useful results (if indeed it's even possible) - then it would be included in the JDK and not subsequently dropped. Of course, this is based on my vast experience in working with Sun's JIT engineers :roll: . (I.e. I'm pretty much just making this up as I go, but why let that stop me?)
My only other reason for thinking these optimizations were connected to the HashMap issue is that from the articles, it seems as though Josh Bloch thought there might be a connection. And even though the reasoning given in the article doesn't seem very convincing, Josh could well know more about it than was covered in the brief conversation recounted in the article. Or maybe not, of course.

"I'm not back." - Bill Harding, Twister

Joe Cheng

Greenhorn

Posts: 11

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hey everyone, be careful to remember that garbage collected runtimes are highly non-deterministic--your timing runs could be highly affected by GC kicking in. At the very least I think each timed operation should be preceded with System.gc() and Thread.sleep(10000).
Also as others have mentioned, Hotspot will make the early runs look bad, so put the whole body of the main method in a for loop and wait for the numbers to stabilize.
And set sufficiently large -Xmx and -Xms so that the heap doesn't need to grow during your test.
Doing this using the code from this thread, I still see HashMap being slower than Hashtable (that's in Win32 1.4.0_01; in 1.3.1_02 they're neck and neck) but only by about 10%, and it's relatively consistent. The numbers stabilize after the second or third iteration.
If you're seeing 100% or 1000% differences in the performance, I highly suspect Java's memory management is throwing a wrench in your numbers... try running with -verbose:gc enabled and see what happens.

David Weitzman

Ranch Hand

Posts: 1365

posted 21 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I wrote (I forget when):

You're probably aware of this, but Hashtable seems to outperform HashMap (at least for the general case) in the Java 1.4 and 1.4.1. There has been a little bit of confusion over at the JavaRanch Performance forum where we've been discussing the issue.

Joshua Bloch wrote (Feb 24):

Hi. This is the first I've heard of this. I looked into it briefly and I believe I know what's going on. I don't have time to write it up tonight, but I'll try to do it soon. For starters, the microbenchmark is not good. It should time the test repeatedly so that you can make sure you're getting a repeatable result. Timing the first run gives you startup transients, which are not what you want. Also you have to figure out how you want to deal with garbage collection: either eliminate it from the timings by calling System.gc() outside of the timing loop, or time runs long enough to predictably include GC.
That said, Hashtable really is faster than HashMap for the data set in this benchmark: strings formed from sequential numbers. The pre 1.4 HashMap implementation, which was retained in Hashtable, does very well on sequential keys because it doesn't scatter them at all: it puts them in sequential buckets. As a result, there are *no* collisions. The price you pay is that certain regular but non-consecutive sequences will all hash to the same bucket, resulting in horrible performance. The 1.4 version (and especially the 1.4.1 version) scatters the keys pseudo-randomly, resulting in occasional collisions when hashing consecutive sequences like the one in the microbenchmark. (The mathematics of this is pretty simple; you can easily calculate roughly how many collisions you'll get.) To test this conjecture, use pseudo-random numbers in the keys instead of consecutive. I did, and found that HashMap is a bit faster than Hashtable, as expected.
Loosely speaking, then, the 1.4.1 implementation is a better general purpose implementation. If you know that you'll be dealing with a consecutive sequence, the 1.3 implementation (still present in Hashtable) could well be faster. On the other hand, for consecutive keys you could just use an array (or List), which would be *really* fast : )
I hope this brief letter was at all comprehensible.

From the "I don't have time to write it up tonight, but I'll try to do it soon" I got the imporession that he would write a better microbenchmark or something in the next day or two and post it here, so I held off responding. I may have missinterpretted, but either way I wish I'd sent a "thanks" note in a timely manner.