Win a copy of The Business Blockchain this week in the Cloud forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

NX: Does HashMap.containsKey(key) lock on the collection?

 
Dave Knipp
Ranch Hand
Posts: 146
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I was curious if anyone has any idea if containsKey(key) and containsValue(value) lock on the collection they are called on. Does the following code snipets do the same thing?

The Map api says that by wrapping a HashMap with the Collections synchronization methods it protects concurrent modification. Basically they say you can't add or delete at the same time. But what they do say is that even though it synchronized you still have to synchronize on the collection when iterating through its contents, which makes sense b/c a value could be updated, deleted, or added while iterating. The api for HashMap does not specifically say whether or not the containsKey() method locks on the collection. I would assume it would, but i can't find any confirmation on that. Does anyone have any idea??
Thanks,
Dave
 
Philippe Maquet
Bartender
Posts: 1872
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Dave,
I understand that you only need to synchronize on the returned synchronized map while iterating on it. But I wonder why you use Collections.synchronizedMap(). Isn't it easier to explicitely synchronize on the original map just when necessary ?
Best,
Phil.
 
Dave Knipp
Ranch Hand
Posts: 146
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Phil,
Well, the Collections synchronization protects from concurrent structural modification. Sun advised that you use this to prevent any ConcurrentModificationException's. I guess i could just synchronize the hashmap when i add or delete and reduce some overhead.
So do you think that containsKey() synchronizes on the collection?
Dave
 
Philippe Maquet
Bartender
Posts: 1872
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Dave,
I guess i could just synchronize the hashmap when i add or delete and reduce some overhead.

That's exactly what I think. You probably have one or two places in your code base where you add/delete in the given HashMap. By synchronizing explicitely on the unsynchronized HashMap, you save some overhead and ... solves your question ! All with a (very) few additional lines.
So, for myself, it's a question I'd even want to invest in more than 1 minute.
And as I am a bit selfish, I won't investigate much more for you.
So do you think that containsKey() synchronizes on the collection?

Yes, that's what I think, but it's only based on the Collections.synchronizedMap() specs. Now, as docs are sometimes unclear, if I really had to be 100% sure of it, I'd check the JDK sources. But IMO, it really doesn't worth while in this context because you'll be more productive by just avoiding Collections.synchronizedMap() as explained above.
Now, if it doesn't cost me anything, the answer interests me ! So if - despite my advices - you investigate the question further, please post the answer here...
Best,
Phil.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I understand that you only need to synchronize on the returned synchronized map while iterating on it
If you use Collections.synchronizedMap() as shown above, then often the only additional synchronization you might need is when iterating. All indivudual method calls are already synchronized (yes, including containsKey()). You may also need synchronization for any operations which require more than one consecutive method call in order to function correctly, e.g.

If the map doesn't already have key foo in it, you probably don't want some other thread putting that key in while doSomethingElse() is running. So it's probably a good idea to put the above code in a sync block if multiple threads can access map - even if it's a synchronized Map.
If you don't use Collections.synchronizedMap(), then you basically need to synchronize all access to the Map - not just iteration. The one big exception to this is if you never make any structural modifications to a HashMap or TreeMap - if you never use remove() or put() with a new key - then you can use the other methods without problem. But if any threads are doing any remove() or put() (with a new key) while other threads have access to the Map, all threads accessing the Map need to use synchronization, or risk ConcurrentModificationException, or garbage results.
[ November 12, 2003: Message edited by: Jim Yingst ]
 
Philippe Maquet
Bartender
Posts: 1872
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jim,
Thank you for entering this discussion.
Reading you, I am reinforced in my idea that using Collections.synchronizedMap() doesn't worth while. Basically, I see it as it transforms a HashMap in a HashTable equivalent (OK I simplify a bit), but without saving you to keep aware of possible synchronizing issues (as you explained above).
Do you agree with me ?
Best,
Phil.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you agree with me ?
In general, yes. There are some cases where synchronizedMap() works perfectly well. But you have to be careful. The biggest objection I have to Hashtable and Vector, and to synchronizedMap() and synchronizedList(), is that people often think that these things automatically make things "thread safe" and then neglect to do any further analysis.
Hmmm... the most common situation where synchronizedMap() might be useful is if you basically want a read-only HashMap. (That is, you presumeably have to initialize it with values somehow, which requires put(), but after that, no further changes.) But in this case, you can just use unmodifiableMap() to create a read-only map - and assuming the underlying Map is a HashMap or TreeMap, no further synchronization is necessary. So synchronizedMap() is really only useful if the Map is modifiable - but if it's structurally modified, that's when you may still need additional sync. So for those cases where you may need to change the value in a mapping, but never remove or add keys, it might be useful to have a Collections.structurallyUnmodifiableMap() to enforce the requirement that no structural modifications be performed. This would combine nicely with synchronizedMap() to create an object that you could safely pass around to other code without worrying about whether the code synchronized correctly. Unfortunately, that's not one of the utility methods provided. Guess I can always code it myself though if the need ever arises.
 
Dave Knipp
Ranch Hand
Posts: 146
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jim and Phil,
So to make a HashMap thread safe without using Collections.synchronizedMap() you would need to do the following:
A) Synchronize all structural modification (adding, deleting)
B) Synchronize when iterating through the collections contents
Therefore synchronization would not be necessary on a get(key) because all modification would lock the map and the read would have to wait until that update, add, or delete was finished. Would you both agree?
Dave
 
Philippe Maquet
Bartender
Posts: 1872
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Simple, clear and useful explanations as always from Jim ...
Thanks,
Phil.
[ November 12, 2003: Message edited by: Philippe Maquet ]
 
Philippe Maquet
Bartender
Posts: 1872
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Dave,
So to make a HashMap thread safe without using Collections.synchronizedMap() you would need to do the following:
A) Synchronize all structural modification (adding, deleting)
B) Synchronize when iterating through the collections contents
Therefore synchronization would not be necessary on a get(key) because all modification would lock the map and the read would have to wait until that update, add, or delete was finished. Would you both agree?

Well, I won't reply for Jim, but I don't agree.
You should synchronize on the HashMap (or on any other monitor BTW) for any access to the map. But practically, it doesn't make much difference : if your HashMap is used to store locks, writes to the HashMap will typically be done in the same methods as reads from it, in such a way that the whole method (or 90% of it) will consist in a synchronized block on the given HashMap.
Synchronizing writes only doesn't make any sense (and if Jim had to disagree with this, let's say that I'd disagree with Jim - quite rare but still possible ).
Best,
Phil.
[ November 12, 2003: Message edited by: Philippe Maquet ]
 
Dave Knipp
Ranch Hand
Posts: 146
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Phil,
The way i was looking at it is that if you lock on structural modifications then when you do a read, that read would have to wait until the update/delete/add lock was released, correct? As long as one thread has locked the map no other access could occur until that thread released its lock. Maybe im wrong here, but thats how i would understand it. This would reduce the overhead on synchronizing on every read (since thats what you do the most with this database).
So the question is, if one thread has a lock on myMap via a synchronized block or a synchronized method like the following:

Can another thread still access that map via a get(key)??? Or does the thread attempting to access the map have to wait until the lock is released by the thread that is in the sync block?
I would go with the latter, but maybe im wrong. Correct me if i am
Dave
 
Philippe Maquet
Bartender
Posts: 1872
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Dave,
Can another thread still access that map via a get(key)???

Unfortunately yes.
Or does the thread attempting to access the map have to wait until the lock is released by the thread that is in the sync block?

Unfortunately no.
but maybe im wrong

I think you are. And the "I think" is just a politically correct way of speaking.
Best,
Phil.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
[Phil]: and if Jim had to disagree with this
Well I try to disagree with Phil whenever I can, just on general principle. But I can't in this case.
[Dave]: The way i was looking at it is that if you lock on structural modifications then when you do a read, that read would have to wait until the update/delete/add lock was released, correct?
No. When you enter a synchronized method or block, the only threads that are blocked are those executing code which is also synchronized, using the same monitor. Unsynchonized code can completely ignore the effects of your sync lock. So you generally need all the code which can access mutable data to be synchronized.
 
Dave Knipp
Ranch Hand
Posts: 146
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Phil & Jim,
Ok, that makes sense, that each thread would have to have to monitor the state of the object which is locked, which then forces it to wait if another thread is in another synchronized block. So if i synchronize on the map everywhere things should work as excepted.
Thanks for your help!
Dave
p.s. Phil you are so politically correct
 
Dave Knipp
Ranch Hand
Posts: 146
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jim,
You said that using Collections.synchronizedMap(new HashMap()); would only be suitable if the map was unmodifiable (ie. it was read in and never changed again). But in the java api it says this regarding synchronization of a HashMap.
Note that this implementation is not synchronized. If multiple threads access this map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new HashMap(...));

It seems like sun stresses that using the sync wrapper will take care of all concurrent modification problems that may occurr when two different threads are performing structural modifications. So wouldnt this take care of the synchronization problem we have been talking about throughout this thread? Granted you would still need to synchronize on the map when iterating through its contents, but other than that it seems like it would be ok, what do you think?
Dave
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You said that using Collections.synchronizedMap(new HashMap()); would only be suitable if the map was unmodifiable
I did? Where?
It seems like sun stresses that using the sync wrapper will take care of all concurrent modification problems that may occurr when two different threads are performing structural modifications.
They never said it would take care of all problems. The API for synchronizedMap() explicitly points out that the wrapper isn't good enough when you iterate; you have to sync manually. They never say whether or not there are other cases where additional sync may be necessary. Moreover, the API indicates that normally you'd sync on some other object that "naturally encapsulates" the Map. It's only if you don't have such an object that you use the synchronizedMap() - in part because there are still things that can go wrong with synchronizedMap() if you're not careful.
So wouldnt this take care of the synchronization problem we have been talking about throughout this thread? Granted you would still need to synchronize on the map when iterating through its contents, but other than that it seems like it would be ok, what do you think?
Let's say you want to ensure that all threads that access a given file use the same DB file use the same single Data instance to do so, but there may be other DB files accessed by other Data instances. Then you might have something like this:

This is a common form of lazy instantiation. The first time someone tries to get a Data to access a given file, you create the instance and put it in the map. Subsequently, when you get a request for the same file, you just return a reference to that same instance. The problem is: what if two threads request a Data for the same file at the same time? Without proper synchronization, there's a chance that you may manage to create two Data instances accessing the same file - which is precisely what we were trying to avoid by using the HashMap. Consider:
  • Thread 1 calls forFile(new File("foo.db")), executes map.get() and gets a null.
  • Thread 2 slices in and calls forFile(new File("foo.db")), executes map.get(), gets a null, creates a new Data, puts it in the map, and returns.
  • Thread 1 regains control, creates a new Data, puts it in the map, and returns.

  • Using synchronizedMap() does not prevent this. But synchonizing the forFile() method would. That's why the HashMap API encourages you to sync on some other containing object first, if possible - usually, this ends up giving you a higher level of protection, preventing other threads from accessing your HashMap in between method invocations. The fact is, the sync provided by synchronizedMap() is often not at the right level to provide the protection we really need.
     
    Dave Knipp
    Ranch Hand
    Posts: 146
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I agree with your statement. That 2 thread could slice in at different times and actually instantiate an object you only meant to instantiate once. That would be taken care of by synchronizing on that method and this would not occur.
    I was just trying to find out the details of what synchronizedMap() does. The API is not very specific and its vague as far as how much it does synchronize. So i would then naturally choose to synchronize myself to ensure that the behaivor i required was implemented. But then this came down to synchronizing on the map everytime it is referenced.
    For example, you can't necessarily synchronize methods that call the map b/c it does not lock the map, it only locks the method that is declared synchronized. This would allow for any other thread to enter any other method that accessed the map and would then cause problems b/c two threads could theoretically change the same value within the map, or one could delete what the other was trying to update, therefore throwing null pointers and alot of other ugly stuff. So the only solution in my eyes is that you must synchronize on the map itself everytime, ie. put every access to that map in a synchronized block.

    As long as you locked on the map everytime you accessed it every thread would have to wait its turn to get their lock on the map and do what they have to do, update, delete, etc. It just seems like this is overkill, but at the same time necessary. What is your take?
    Dave
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic