• Post Reply Bookmark Topic Watch Topic
  • New Topic

Using synchronize on overloaded methods  RSS feed

 
Mike Curwen
Ranch Hand
Posts: 3695
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi.

I apologize for the lengthy post, but at least this way you have all the code.
Ok, so what happens if the class that contains these methods (CategoryLister), is used inside a servlet and is declared as a class-wide variable. ie: servlets are multi-threaded and I will need to make sure my getList() method is 'thread safe'. Which as it stands, it is not, because maybe one thread is returning the contents Vector that is (possibly) in the middle of being refresh()'d

So, which of the getList methods do I make synchronized? All of them? Just the last one (the one that all the others eventually call)?

Also, if I need to synchronize all of them, where do i put notify() in a single line method like the first getList()?

Thanks for the help anyone.
[This message has been edited by Mike Curwen (edited October 31, 2001).]
 
Madhav Lakkapragada
Ranch Hand
Posts: 5040
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

......
maybe one thread is returning the contents Vector that is (possibly) in the middle of being refresh()'d....

Vector is already a syncronized class.
So I don't think this is a valid situation. Not very sure though...Interesting problem.
regds.
- satya
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd synchronize all access to the underlying contents vector - the easiest way is to synchronize both getList(int) and refresh(). The fact that contents is a Vector and therefore synchronized doesn't provide as much protection as you may think - it prevents concurrent access during a method call to that Vector, but not between method calls (like in between the calls to size() and subList()), and it has no effect when you reassign the contents reference to another Vector, as in refresh(). You need more. I don't think you need to synchronize the first two methods, but you can. Are there any other methods which affect or access contents? If so, those need synchoronization too - and consider what would happen if those other methods were to run in between refresh() and getList(int) - you may need to synchronize the first two methods as well.
I'm not sure about the notify() part of the question. Notify who? Are other threads executing wait() somewhere?
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Madhav Lakkapragada:
Vector is already a syncronized class.
Famous last words.
The fact that Vector is already a synchronized class means that multi-thread access will not mess up the internal structure of your vector, and otherwise absolutely zero, zip, zilch, diddly squat, nothing, nichts, niks, rien. It certainly does not mean your code is threadsafe. To the contrary, I'm willing to bet that code using a Vector has a higher chance of being thread-unsafe than code using an ArrayList, and you just proved my point
In this case, there is no problem with getting a Vector that is in the middle of a refresh. Presumably CategoryDAO.getNonEmptyCategoryRows() returns a fresh Vector with the data. The assignment to "contents" is atomic, so there is no risk of getting a half-refreshed Vector. You might just want to turn "contents" into a volatile variable to guarantee that the assignment is immediately visible to other threads.
There is a threading problem in the bit where you transform a call to getList() into a call to getList(contents.size()). Between contents.size() and the return statement in getList(int), the contents vector may well be reassigned. Ditto within getList(int), between the size test and the final return statement. You can fix this either by synchronizing the whole lot, or by assigning "contents" to a local variable and then just operating on that local variable from that moment on, passing it into methods as needed.
There is another threading problem in the code that determines whether it is time for a refresh. In particular, the "if (System.currentTimeMillis()..." statement and all calls to refresh() need to be synchronized, or else multiple refreshes may start simultaneously. I once coded a < cache > tag that made the same mistake but regretted it as soon as we started load testing. It will bring your site to its knees under high loads. One thing you could do isYou call this method to determine if you need to refresh or not. This way, you first set lastRefresh, then refresh your data outside the synchronized block. Other requests get served straightaway while the query is running (with slightly obsolete data, but that is unimportant) and you prevent a build-up of requests waiting for the query to complete.
For an industrial-strength application, you want to pay some attention to error handling: what to do if the query fails? Keep on serving cached data just to keep the site up? Or invalidate the contents Vector?
In summary, if performance under load is unimportant, you can synchronize all of the getList() methods - but you must synchronize all of them. If performance under load is important, you can make "contents" volatile, ensure that each call takes a copy of the "contents" reference and operates on that copy for the remainder of the call. In addition, refresh() needs to be reorganised as outlined above.
General refactoring - consider adding a forceRefresh flag to getList(int) and get rid of the calls elsewhere - this removes some duplicate code and moves control of refreshes into one place. And, personally, I would lose the Vector and use ArrayList throughout; you don't actually need a synchronized data structure here.
- Peter
 
Mike Curwen
Ranch Hand
Posts: 3695
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
JIm..
That was a question I had about threads in general...

Is it *always* necessary to call notify() or notifyAll() at the end of a synchronized method?

From your response, I'm guessing only if there is another thread calling wait().

Ok.. I had a big huge bit of text here that was a several quotes from Deitel*2 that confused me, because they seemed to contradict each other. But reading further, here is my answer:

"It is important to distinguish waiting threads that blocked because the monitor was busy [locked] vs. thread that explicity called wait() inside the monitor. Upon completion of a synchronized method, outside threads that blocked because the monitor was busy can proceed to enter the object. Threads that explicity invoked wait() can only proceed when notified via a call by another thread to notify() or notifyAll()."

What was confusing was this: "A thread executing in a synchronized method may determine that it cannot continue, so the thread voluntarily calls wait()". This is the case on the right hand side of the vs., in the quote above.

By voluntarily, they meant "the programmer will call wait" as opposed to "the runtime magically calls it for you" (at least I hope that's what they meant, otherwise, wouldn't you need to always call notify(), just to be sure)?

Thanks for the help.
[This message has been edited by Mike Curwen (edited October 31, 2001).]
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Jim Yingst:
I'd synchronize all access to the underlying contents vector - the easiest way is to synchronize both getList(int) and refresh().
That's not enough. For example, the call to contents.size() in getList() and the return statement in getList(int) need to be atomic - if the contents Vector is changed between the two, you may end up with incorrect results. Unless you take a copy of contents first so you know that it cannot be changed.
- Peter

[This message has been edited by Peter den Haan (edited October 31, 2001).]
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Mike Curwen:
Is it *always* necessary to call notify() or notifyAll() at the end of a synchronized method?
No. Only if there's a corresponding wait().
What was confusing was this: "A thread executing in a synchronized method may determine that it cannot continue, so the thread voluntarily calls wait()".
What the author probably means to say is that you, as a programmer, may face the situation that you are in a synchronized block but a necessary condition has not been met. You can then code a wait() to release the monitor lock and wait for the condition to be satisfied. For example, in a classic producer/consumer problem the consumer will generally need to wait() for the producer to come up with something.
There is no question of the JVM automatically wait()ing for you.
- Peter
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I forgot - is the fact that you return a fresh copy of the list for each call to getList() intentional? Unless the calling code needs to modify the Vector it gets, you can just return a more efficient Collections.unmodifiableList(subList()). Obviously that would be a List and not a Vector, but arguably the return type should've been List in the first place
- Peter
 
Mike Curwen
Ranch Hand
Posts: 3695
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Peter, thanks for your detailed replies.

At this point, I'm not in for re-writing my app. Many things depend on the Vector as the return type. At the time of writing, it seemed like a nice 'magic bag' to store things in. So for now, it's Vector.

I do have a question about your refresh ideas.

I'm not sure how multiple refreshes can be occuring if I synchronize all of the getList() methods, as these are the only methods to call refresh().

Also, not sure I completely understand how to *use* the mustRefresh() method you show me. Specifically ... 'then refresh your data outside the synche'd block'

What synch'ed block is that?

For the refactoring comment, don't I already have a method call with the flag? And I can't (don't want to?) get rid of any of the 4 getList methods, as these are part of my interface for this class.

Thanks for your input. I'm about 85% understanding what you mean.
[This message has been edited by Mike Curwen (edited October 31, 2001).]
 
Mike Curwen
Ranch Hand
Posts: 3695
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, my head is starting to hurt...

I thought your idea to work on a copy was a pretty good idea. But wait.. Argh.

Do you mean...
But then in the getList(int) method, it would take a clone of a clone...

Sheesh.. giving the user of this object some flexibility by giving them 4 methods has NOT turned out to be a great boon.
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Mike Curwen:
I'm not sure how multiple refreshes can be occuring if I synchronize all of the getList() methods, as these are the only methods to call refresh().
They can't. Apologies if I wasn't clear enough - I gave two alternative approaches, one which doesn't scale well under very high loads, another which does.
The first approach is to synchronize all of the getList() methods. This is simple and works fine for low to moderate loads. However, when you need to refresh the vector you spend a significant amount of time inside synchronized code. In a high-load situation that is unacceptable; first, because all these waiting clients consume badly needed Threads; second, because they may cause a sudden rush on your resources when the query is done; third, because the inability to fire up multiple queries in parallel decreases your overall throughput. But, it's straightforward and simple, and in many cases perfectly adequate.
The second approach avoids synchronization except in the mustRefresh() method, which is very short. This gives you a very high degree of concurrency at the cost of additional complexity. In getList you would callto to fire off the query if necessary. The salient point is that the actual call to refresh() is not synchronized. It is in this second alternative that you need to beware of kicking off multiple (non-forced) refreshes in parallel.
Regarding refactoring, I'm not suggesting you get rid of any of the getList() calls; I just suggested you move the body of getList(int) to getList(boolean, int). Only I didn't put it that way because I failed to notice that you already had a getList(boolean, int) method. If you do this, you can concentrate all the refresh code inside this method instead of having it in a couple of different places.
In fact, if you go for the first (simple) method of synchronization, you can rig it so that you only have to synchronize a single method:To my mind this is better than having refresh calls everywhere, and those calls to contents.size().
- Peter
[This message has been edited by Peter den Haan (edited October 31, 2001).]
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Mike Curwen:
Ok, my head is starting to hurt...
Whoa. What I meant is taking a copy of the reference. I.e. a simple assignmentAnd from that moment on, the call will just operate on "mycontents" and ignore what happens to "contents". That way, you won't have to worry about other threads refreshing "contents" anymore.
This technique is definitely for the second - high-concurrency - option.
- Peter
 
Mike Curwen
Ranch Hand
Posts: 3695
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Peter,

That was what I thought you meant at first, until I remembered the whole thing about pass by reference/value.

If I do that simple assignment, am i not still using the Vector on the outside of the method call? I think I'm missing a salient point here.
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Mike Curwen:
That was what I thought you meant at first, until I remembered the whole thing about pass by reference/value.
Don't remind me please I hate it when a discussion turns personal.
If I do that simple assignment, am i not still using the Vector on the outside of the method call?
Yep, you're using the same Vector - but, at least in the code you gave, that Vector is never modified. There are just two things that happen to the contents Vector: (1) a sublist is extracted and a copy is returned, so clients cannot touch the original whatever they do; (2) from time to time a new Vector is constructed and assigned to "contents". This, too, leaves the original Vector object unchanged.
So all you need to do is take a copy of the reference.
- Peter
[This message has been edited by Peter den Haan (edited October 31, 2001).]
 
Mike Curwen
Ranch Hand
Posts: 3695
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OOOHHHHHH.

There should be a smilie for 'light bulb turns on'.

Yes, doing the simple assignment means that the method-local variables refers to that same Vector. In most cases this would be bad and lead to confusion. It's a typical newbie mistake, in fact. But in our case, we *want* this behaviour, because if another method *replaces* the Vector (ie , the refresh method), it does an assignment... to the outside Vector variable. And the method-local one still points to the 'old' one.. thereby ensuring it is not garbage collected until the code of the method is at least done with it.

I think this qualifies as "clever" coding that I will NOT understand in 2 months without a heavy comment.

Thanks very much Peter. you've given me lots of good info.

In case you're wondering, I'll go with the 'low to medium load' method. The refresh only occurs once every 6 hours. So that means that once every 6 hours, it's *possible* that a servlet instance will blow up. And by hitting refresh on their browser, the client can continue on as if nothing happened.

One final thing: As I understand my own code... if I get a SQLException from my db object, the Vector is not re-assigned, and the CategoryLister object continues to use the old values. Which is alright.
 
Mike Curwen
Ranch Hand
Posts: 3695
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And if you can believe it, I have *one* last question about the mustRefresh() method.

mustRefresh() itself *is* synchronized, so only one thread at a time will be able to determine if a refresh is needed. The only way I can see there being multiple refreshes going on in parallel is if I send true into the method, which is the opposite (?) to that implied by your line:
"It is in this second alternative that you need to beware of kicking off multiple (non-forced) refreshes in parallel"

So just to clarify (hopefully), my thoughts... only when I force a refresh will multiple refreshes be a problem. If I always let the code internal to mustRefresh() do the deciding for me... the only part of the conditional to provide 'true' is the timeout check on the RHS of the OR. Because this method is synchronized, only the *first* thread to recognize the timeout will return 'true', and thus trigger the refresh(), all others will return 'false'. Unless of course, I send 'true' for the forceRefresh parameter.
 
Peter den Haan
author
Ranch Hand
Posts: 3252
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Mike Curwen:
And if you can believe it, I have *one* last question about the mustRefresh() method.
Since you decided to go for the first, simple, low-to-moderate load option, you can safely forget all about mustRefresh(). But, for academic interest:
mustRefresh() itself *is* synchronized, so only one thread at a time will be able to determine if a refresh is needed. The only way I can see there being multiple refreshes going on in parallel is if I send true into the method [...]
mustRefresh() is designed to prevent the situation that multiple refreshes go on in parallel. It prevents this by setting lastRefresh before the actual refresh happens.
As you note, if force is true, multiple refreshes may occur. If I expected a large number of calls with forced refresh, I would probably set a flag that indicates that a refresh is underway. When this flag is set, subsequent calls to getList(true) would wait() for the refresh to complete. The refresh code would notifyAll() waiting calls. That way, you never have more than one query going on.
But given the choice you made all of this is of academic interest only. You see that designing data access for maximum throughput and concurrency introduces a fair degree of additional complexity!
- Peter
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!