• Post Reply Bookmark Topic Watch Topic
  • New Topic

visibility and reordering  RSS feed

 
Andrew Cane
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm having trouble understanding these two concepts.

Visibility:


lets say thread1 calls work and thread2 calls stopwork, how is it possible for thread1 to be trapped in infinite loop? sometimes when I read an article about visibility, I feel as if it's actually talking about stale data (out-of-date data). So it's a little like loss update in DB access problem. If so, using the term "not visible" is very misleading. CMIIW.

Reordering


how is it possible for thread that calls threadTwo() sees the change to var b before var a?
thanks
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andrew Cane wrote:I'm having trouble understanding these two concepts.

Visibility:
lets say thread1 calls work and thread2 calls stopwork, how is it possible for thread1 to be trapped in infinite loop? sometimes when I read an article about visibility, I feel as if it's actually talking about stale data (out-of-date data). So it's a little like loss update in DB access problem. If so, using the term "not visible" is very misleading. CMIIW.

It is very much that thread1 sees stale data. Thread2 updates the value but thread1 doesn't see the change. It is called visibility because the change has been made - thread2 will see the change and possibly other threads can see it as well, so it isn't like a lost update where the change may not be applied. But thread1 may not see the change if there was no chance to get the current state. This visibility: some threads might see the change, other threads might not.

Reordering
how is it possible for thread that calls threadTwo() sees the change to var b before var a?
thanks

How is it possible? Well it just is based on the specs. If you look at JLS17.4: The Java Memory Model you will see that the runtime is allowed to reorder any code as long as that reordering doesn't have any visible affect on the sequential program order (i.e. doesn't affect the operations in a single thread). So since it would be impossible to detect the reversal of the order those two assignments in one thread, that change would be allowed if the compiler or runtime decided it was more efficient to do so. If there were intermediary code that used both boolean values then the reversal would not be allowed because the change would affect the intermediary code.
 
Chris Hurst
Ranch Hand
Posts: 443
3
C++ Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In terms of naming the issue the strict term is happens before ordering but this covers a wide range of things that can happen including stale data and reordering.

Think of it like this you jump into a pit of lions now you could argue it was misleading to describe the danger as the being eaten by lions problem when in this particular instance you were clearly killed by the fall before the lions could.

In the first example the code could go into an infinite loop because of reodering or visibility problems ...

1) Say you run code on a special distributed JVM (non Oracle) such that a thread runs on machine A , the memory store is on machine B and a second thread is on machine C , and machines collaborate through volatile read /writes.
It should be come quite obvious setting done = true on machine A will do have no effect on machine B and C (volatile done would push the value out/ read the value in). This example although bizarre its actually possible and I do find useful as it makes it obvious why you need volatile reads and writes. Note correct java (obeys JSR133) would work.

2) OK, so your on a standard JVM deployment so now consider an underlying weak model i.e. caching on CPUs / registers in this case done=true; imposes no cache flush (non volatile write) in this case you would most likely see stale data (non volatile read) , the read sees the result eventually probably as that resource/cache is required by something else. Note your most likely in this case to observe the loop not finishing as quick as it could have ... stale read .. though an infinite loop is possible.

3) OK , so your sure none of them apply so ... what about this, this is legal and more efficient for single threading for the JVM to refactor your code like this ..
Note in this case infinite loop.
Good JVM broke your bad code, the JVM optimises for the more common single threaded code over multithreading ie I won't be slow just in case as the programmer can let me know in the multithreaded case.

Note volatile bool forbids this optimisation ...



... or some other lion ;-) ... in some ways its best not to know what gets you just that there exists something that will and the rules to avoid such a scenario, the problem is there is always something you missed or indeed things being introduced between JVM implementations (as long as they respect the Java Memory model).


Example 2 ...

For this one look at visibility, rather than say reordering as a for instance ..



Your JVM can decide it makes sense to flush/publish a but not b, ie its allowed to and might make performance sense depending on say what the JVM predicts will be used next as a for instance, so a non volatile read from main memory would be allowed to see a but miss the newer b (non volatile writes) still effectively cached by the other thread. Literally for thread A both have happened but we say we have no happens before ordering on the read in thread B.

Again other explanations exist ....


 
Andrew Cane
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Because in my understanding, if var "done" not visible, then the read "System.out.println(done);" would produce an error, not wrong value (because it's not visible, and you can't print something that's not visible). But okay, it's just grammatical misunderstanding and let's move on, unless I'm missing something here.

So, basically visibility refers only to stale data problem, right? there are no other aspects included by this term?

@Chris Hurst:
1. Could you please elaborate more on how reordering can be useful? I haven't done any distributed programming before, so your first point (point number one, not LoopMayNeverEnd class example) kinda confuses me. Sorry.
2. I still don't understand how an infinite loop is possible. Even if it's stale data read, the variable value change should've been picked up eventually, and finally the loop will terminate? could you please explain more about this?
3. for your second example (code that sets a and b to true), could you please elaborate more on how "a non volatile read from main memory would be allowed to see a but miss the newer b (non volatile writes) still effectively cached by the other thread"? is the JVM programmed to do so? could you please give me the simple case where this behavior is actually beneficial?

thanks a lot.
 
Chris Hurst
Ranch Hand
Posts: 443
3
C++ Eclipse IDE Java
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Visibility refers to the change of the variables value not the variable itself.

All Java variables are guaranteed an initial zero or null state so they will at minimum be visible as that, the issue is changes to these variables may not be visible ever, though they usually are.

As an aside avoid things like System.out as it can effect the result, ie the implementation can synchronize and fix your problem at a performance penalty.

For your purposes I would start with 3) as that is actually the most likely out come for you to observe. Look at my code your going round a while (true) loop, so that's why it never ends. why is that more efficient in your example, because it never tests the value in the loop and hence would never see it change ??
The code you think the JVM should execute would have to read from main memory this is considered slow, my code should run faster than yours because it does less . In single threading terms the two pieces of code are equivalent so the JVM can choose to execute my version in preference to yours, if you don't want that to happen tell Java not to eg use volatile etc.

Caching is not a recommended way of thinking about this but if you must think of it only as caching you must remember their are two caches involved the write cache of the mutating thread and the read cache of the observer. Even if the write cache of the thread making the change effectively flushes its cache, the reading thread must drop its read cache both these operations are considered expensive.
Consider this also if it was a cache why must a cache be first in first out or must flush everything ? that's very inefficient with restricted resources why not for instance cache the most popular item or predicted most popular, you may be thinking a cache must be fair or simple (again it may not be a cache at all).

The JVM is written to only guarantee a weak memory model for performance and portability, a strong memory model would give the effects as you expect them potentially at a performance cost. Note the issues with memory models are throughout your system e.g. OS, threading libraries, CPU's i.e. if you consult INTEL and AMDs docs you will see a discussion of problems similar to the ones you describe and also a strict definition of what their memory model guarantees and doesn't.


 
Andrew Cane
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
so, it's very likely that JVM will modify this code :

to this


secondly, lets say there's a scenario where it's guaranteed that whenever there's a thread that calls work(), there will be another thread that will call stopWork(). maybe not immediately, but sometime after the thread calling work() starts. can I say that in this scenario, there's a 100% guarantee (absolute certainty) that there won't be any infinite loop?
thanks
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andrew Cane wrote:secondly, lets say there's a scenario where it's guaranteed that whenever there's a thread that calls work(), there will be another thread that will call stopWork(). maybe not immediately, but sometime after the thread calling work() starts. can I say that in this scenario, there's a 100% guarantee (absolute certainty) that there won't be any infinite loop?
thanks

No. As has been said, the JVM doesn't consider other threads - even if you know there will be a second thread, the JVM doesn't have to consider it to make its optimizations. And in Java, if there is no synchronization or volatile involved, you are guaranteed of very little (mostly about code ordering before and after thread.start() and thread.join()).
 
Andrew Cane
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
in Chris Hurst's example



it's possible that JVM may decide to publish a but not b, please elaborate more. I'm completely lost here. what does "publish" in this context mean? thanks
 
E Armitage
Rancher
Posts: 989
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If a and b are shared (can be accessed by more than one thread) then the JVM will make the changes to a and b available to the current thread that ran the assignment statements but there is no guarantee as to when the changes to those values will be seen by other threads. This is because when the statements are run it is possible that the JVM may write the changes to a local cache that is only seen by the current thread, perhaps because using that cache is faster than using the memory location that is accessed by all other threads. At some point the JVM will decide to write those values to the shared memory location (publish) but then when it does it may decide that it's faster to write the value of b first, perhaps because b is now easier to read from the cache than a (doesn't matter what the reason would be). So generally, the JVM may optimize a program so that a single thread performing a task will complete the task as quickly as possible. It is up to the developer to tell the JVM that other threads could be involved (slow the JVM down). The only guarantees in order are defined by the JVM spec link posted above


An unlock on a monitor happens-before every subsequent lock on that monitor.

A write to a volatile field (ยง8.3.1.4) happens-before every subsequent read of that field.

A call to start() on a thread happens-before any actions in the started thread.

All actions in a thread happen-before any other thread successfully returns from a join() on that thread.

The default initialization of any object happens-before any other actions (other than default-writes) of a program.
 
Andrew Cane
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I see. So does that same thing also apply for read operation? JVM may decide to read from cache instead of main memory?
also, cache is available per core right? or does every thread have its own cache? thanks
 
E Armitage
Rancher
Posts: 989
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andrew Cane wrote:I see. So does that same thing also apply for read operation? JVM may decide to read from cache instead of main memory?

It will read from where the value is.
Andrew Cane wrote:
also, cache is available per core right? or does every thread have its own cache? thanks
Depends on JVM implementation and processor types. They may actually be no cache at all.
 
Chris Hurst
Ranch Hand
Posts: 443
3
C++ Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are lots of levels of caches at the CPU (though potential cache snooping implementations can eliminate some effects) and registers count as caching for this purpose. Also threads can have their own memory space/caches in the VM eg TLABs (these are per thread) .

If your just generally interested in the topic you may wish to google memory barriers and fences that should give you good blogs around this area though possibly not specific to Java.
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andrew, i think the bottom line is the only thing you can be sure of are the guarantees made in the Java Memory Model, pointed to earlier and which the book you have been reading summarizes. Everything else is highly dependent on timing, processor, OS, JVM implementation, memory, and other things. You can not predict what reordering can or will be made, nor can you predict what data will be visible to what threads except when you correctly use synchronization and volatile as described in the Java Memory Model. Trying to dig in to the whys hows and whens reordering will be done or visibility issues arise is a black hole, because there are too many variables. It is a waste of time unless you plan on writing a JVM which avoids them. Otherwise they are just something you have to know is a possibility whenever using multiple threads and protect the data accordingly.
 
Maxim Karvonen
Ranch Hand
Posts: 121
12
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andrew Cane wrote:I see. So does that same thing also apply for read operation? JVM may decide to read from cache instead of main memory?

From ANY cache unless it's prevented by some kind of synchronization (happens-before relation). And for every read. JVM may be executed on several processors. And each processor may have it's own cache. So first processor may write value 1 to some int variable, second may write 2 to the same variable and third may write 3. When you read it variable you may see any of the four values (1, 2, 3 and default write of 0) depending on the current processor. So unless there is sufficient synchronization, you may observe very strange results.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!