Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Concurrent access of a HashMap

 
V Chauhan
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Outs is a web application deployed on weblogic 7.

We have a HashMap, which stores ~10 entries. This object is accessed concurrently from multiple threads but the access is not synchronized. The threads might get some object or pur a new mapping.

Now we are seeing some of the threads getting struck and the CPU utilization is shooting up. The thread dump of those threads points to the HashMap.

Can this unsynchronized access somehow be the cause? Hope my explanation makes some sesce.

TIA,
Basu.
 
Peter Chase
Ranch Hand
Posts: 1970
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, I don't see immediately how any method call in HashMap would hang (is that what happens?) due to multiple unsynchronised access. That said, however, if you are doing multiple unsynchronised access, you are violating the correct usage of HashMap and can expect either ConcurrentModificationException, if lucky, or undefined behaviour if not.

You must synchronise to access this map.
 
Daniel Mayer
Ranch Hand
Posts: 103
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Basudev Agrawal:

Can this unsynchronized access somehow be the cause? Hope my explanation makes some sesce.


Short answer: yes.

Due to the memory model of Java, unsynchronized access to memory from different threads leads to undefined behaviour.
 
Mr. C Lamont Gilbert
Ranch Hand
Posts: 1170
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Its not due to the memory model. its fundamental principle of multi threaded programming.
 
Annekee Dufour
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I also intend to use an unsynchronized hashmap. Since I expect a lot of gets (> 1000 a day) and iterates, and very little puts (12-15 a day), I don't want to synchronize every get and iterate action. Therefore, if I do a put, I start by cloning the original map, put something in the new map, and then assign the newmap to my original map variable. Is this a safe alternative to synchronizing?
 
Arjun Dhar
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The problem one wishes to avoid is that during "put()", only one thread can put at a time. The "get()" need not be protected., in this situation atleast. I think all the hassle of creating a clone etc. though seems sound in inefficient and unnecessary, so just try to synchronize the put() and see how it goes.

You could also try something else, have a read Only object that consists of the get() method and another write Only object that consists of the put() method. While using the write object, use it in a synchronized block to garuntee thread safety.
 
Warren Dew
blacksmith
Ranch Hand
Posts: 1332
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Synchronizing the put() is not enough. If you only do that, a get() that occurs while the put() is in progress may get corrupted data.

Of course, depending on the application, perhaps you don't care if you get corrupted data a few times a day.
 
Peter Chase
Ranch Hand
Posts: 1970
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Annekee Dufour:
I also intend to use an unsynchronized hashmap. Since I expect a lot of gets (> 1000 a day) and iterates, and very little puts (12-15 a day), I don't want to synchronize every get and iterate action. Therefore, if I do a put, I start by cloning the original map, put something in the new map, and then assign the newmap to my original map variable. Is this a safe alternative to synchronizing?


Such attempts to avoid synchronisation usually turn out not to be safe. If you don't synchronise or use volatile, the compiler or HotSpot thinks it is at liberty to use registers, re-order operations, delay writes etc. etc. A good example of this is the "double-checked locking" anti-pattern, which looks like clever avoidance of synchronisation, but is actually horribly broken.

Have you actually proven that synchronisation would slow your application down by any significant amount? In recent JVMs, synchronisation really isn't that slow, and anyway, the bottleneck in your application is rarely where you think it is, unless you've actually done experiments and measurements.
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Annekee Dufour:
I also intend to use an unsynchronized hashmap. Since I expect a lot of gets (> 1000 a day) and iterates, and very little puts (12-15 a day),

You copy-on-write solution will be fine, however as pointed out it results in minor object churn and performance impact.

Another method is to use Doug Lea's Concurrent Library, specifically a subclass of ReadWriteLock. It allows multiple reader threads and only one writer thread. You can give preference to either type of lock acquisition or first-in-first-out thread order.

The key is that while you are still using synchronization during every read and write call, the blocking is as short as possible -- on the lock object itself. Thus all reading threads will be allowed concurrent access to the map.

Doug's package is very robust, quite complete, tested thoroughly over the past five years, and is a part of Java 1.5 -- or it's going into 1.6, I forget -- as java.util.concurrent.

Here's sample code from the JavaDoc page linked above:I just remembered (and verified) that Doug has already coded up two ConcurrentHashMaps for you to use out of the box. There are two versions (look under the Collections heading on the first page linked above).
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Warren Dew:
Synchronizing the put() is not enough. If you only do that, a get() that occurs while the put() is in progress may get corrupted data.


Even if the get occurs *after* the put it may see corrupted data! Only by synchronizing get() you are guaranteed to correctly see changes made by another thread!
 
Warren Dew
blacksmith
Ranch Hand
Posts: 1332
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Harkness:

I just remembered (and verified) that Doug has already coded up two ConcurrentHashMaps for you to use out of the box.

Or you could just use Collections.synchronizedMap(new HashMap()) when getting the HashMap.
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Warren Dew:
Or you could just use Collections.synchronizedMap(new HashMap()) when getting the HashMap.

While that would work, Annekee asked specifically about a solution that would work without blocking concurrent readers. Using Collections.concurrent* will allow only one reader or writer at any one time, creating an unnecessary bottleneck when there are far more readers than writers.
 
Henry Wong
author
Marshal
Pie
Posts: 21390
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
While I think this effort to remove synchronization is worthwhile, there also have to be a balance. The synchronization time for contended locks is improving with each release of Java. It is now to the point where some of these efforts being considered may actually be more time consuming than the synchronization time saved.

Henry
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Henry Wong:
While I think this effort to remove synchronization is worthwhile, there also have to be a balance. The synchronization time for contended locks is improving with each release of Java.

The point is not to shave milliseconds off a method call by removing synchronization or changing the object on which we synchronize. The goal is to allow concurrent access where possible and serialize only where absolutely necessary.

Any number of threads may be reading and iterating over a map concurrently with no fear of deadlock or interference. Thus synchronizing the entire map will degrade performance -- not because the synchronization keyword adds runtime overhead but rather because you're making threads wait needlessly to perform their operations.

If instead you only serialize writes -- the place where data structure corruption can certainly occur in the case of multiple writers -- you are minimizing contention. If your access pattern has reads far outweighing writes, this could be a huge win.

This is a natural extension of the "synchronize the smallest amount of code necessary" maxim.
 
Henry Wong
author
Marshal
Pie
Posts: 21390
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are a couple of interesting issues here...

First, this is about synchronization of a container. It is about as small as you can get -- just synchronize long enough to store or retrieve an item... but the second issue is more interesting...

The amount of time to synchronize is actually not consistent. It depends on whether the lock is contended. If you try to grab a lock that is already owned, it may take milliseconds to acquire the lock after the lock is freed. If it is unowned, it will take nanoseconds... this means that you pay a very small (possibility insignificant) price if no conflict occurs. You do lose milliseconds if the lock is contended, but you have to anyway to be thread safe.

Not really providing a clear cut answer of what to do, but hope this helps,
Henry
 
Henry Wong
author
Marshal
Pie
Posts: 21390
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
One more issue... Using a reader writer lock on any class without knowing the implementation is not a good idea. At first glance, two threads should be able to read from a container simutaneously, but it is implementation specific. You don't know if the container has to calculate an index or other value internally. This means that two simultaneous reads from a container may not be thread safe.

Hope this helps,
Henry
 
Mr. C Lamont Gilbert
Ranch Hand
Posts: 1170
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Annekee Dufour:
I also intend to use an unsynchronized hashmap. Since I expect a lot of gets (> 1000 a day) and iterates, and very little puts (12-15 a day), I don't want to synchronize every get and iterate action. Therefore, if I do a put, I start by cloning the original map, put something in the new map, and then assign the newmap to my original map variable. Is this a safe alternative to synchronizing?


No this is quite unsafe. Your methodology is correct though. You just need to ensure that 2 threads do not try and to a put at the same time. Which would result in a lost put. Thus, you will have to syncrhonize

I always try my best to avoid synchronization. Its a good idea to do so, but if you ever accomplish it you better institute a once a week sanity check to be sure you didn't blow it. Which you probably did
 
Annekee Dufour
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks everybody,
I guess I will be using the ReadWriteLock, or add some synchronization around the newmap. One of the reasons I did not want to use synchronization around the original map is that it will be accessed by other people's code, and they might iterate over it without synchronizing. Since there won't be that many puts, this will not be a problem most of the time, but sometimes it will be a problem.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by David Harkness:
If instead you only serialize writes -- the place where data structure corruption can certainly occur in the case of multiple writers -- you are minimizing contention.


You are right that you only need to *serialize* writes, but there is more to synchronization than that:

- You also need to make sure that noone reads while a (non-atomic) write is in progress, and
- synchronization in java is not only about serialization of access to data, but also about read/write barriers. That is, after a write, all reading threads need to synchronize, else they might see inconsistent data because of not fully uptodate local caches.
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ilja Preuss:
You also need to make sure that noone reads while a (non-atomic) write is in progress

Using a ReadWriteLock does exactly that. It allows any number of readers or a single writer to acquire the lock.
after a write, all reading threads need to synchronize, else they might see inconsistent data because of not fully uptodate local caches.

I remember reading an article by Doug Lea discussing various ways around this: using transient, synchronizing twice, etc. IIRC, in the end his conclusion (this was from 1.1/1.2 days) was that there is no 100% solution.

Do you have an recent references on precisely how this should be done?
[ September 29, 2004: Message edited by: David Harkness ]
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by David Harkness:
I remember reading an article by Doug Lea discussing various ways around this: using transient


I guess you mean volatile?

synchronizing twice


How would that work?

Do you have an recent references on precisely how this should be done?


No, sorry.
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ilja Preuss:
I guess you mean volatile?

LOL, yes sorry.
Me: synchronizing twice
IP: How would that work?

I wish I could find the original article I read. I suspect I'm remembering incorrectly. What I do remember is that it walked through several (four or five) iterative designs to improve the following trick.After doing some testing, he found that this does not ensure that Foo will be instantiated only once. Making foo volatile didn't ensure it either. There were a few more tricks he used -- and "synchonizing twice" probably wasn't one of them -- but IIRC ended at the same place: you cannot ensure it given the Java memory model at the time.

Now, does this apply to 1.4 or 1.5? I don't know. In any case, here is one article about the memory model that I did find and Bill Pugh's Java Memory Model page has other references.
[ September 29, 2004: Message edited by: David Harkness ]
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yay! I found it on the Memory Model page linked above: The "Double-Checked Locking is Broken" Declaration. And it does discuss a solution using nested synchronization -- it doesn't work, but it discusses it so my memory isn't totally hosed.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ah, the infamous DCL - I thought you might mean something different...

Thanks anyway...
 
Henry Wong
author
Marshal
Pie
Posts: 21390
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just did a quick lookup of the java.util.Hashmap class. Apparently it is thread safe as long as the container does not structurally change. This means that using a reader writer lock lock should work. In fact, if you are changing an element with another element, you should also just grab the read lock; even though you are changing values, you are not changing the structure of the container, so it's thread safe.

I'm surprise with this myself... learn something new everyday...

Henry
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Henry Wong:
In fact, if you are changing an element with another element, you should also just grab the read lock; even though you are changing values, you are not changing the structure of the container, so it's thread safe.


Unless the map has to resize, I assume?
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ilja Preuss:
Unless the map has to resize, I assume?

Why would the Map be resized when replacing the value for a pre-existing key?
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by David Harkness:

Why would the Map be resized when replacing the value for a pre-existing key?


Of course it won't - silly me... :roll:
[ October 01, 2004: Message edited by: Ilja Preuss ]
 
Henry Wong
author
Marshal
Pie
Posts: 21390
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This may be interesting... JDK 1.5 (a.k.a. 5.0) added a concurrent hashmap. It is threadsafe, so no external synchronization is necessary. It is highly optimized, allowing parallel reads to occur. And if I remember correctly, it is also segmented. Depending on how the keys hash, it may also allow parallel writes to occur.

So, if a fully optimize hashmap is needed, consider upgrading to 1.5.

Hope this helps,
Henry
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic