• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Memory Consistency

 
Bartender
Posts: 1464
32
Netbeans IDE C++ Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is more of a Windows question than a C/C++ question, but I am working in C/C++ on a Windows machine, so maybe I can ask this here: How do I guarantee memory consistency across different threads for large buffers? As I understand it, two different threads might see different values at the same address, because one thread might write a value to that address that only remains in cache, not making it back to shared memory. Java's concurrency collections deal with this explicitly:

The Oracle Gods wrote:Actions in a thread prior to placing an object into any concurrent collection happen-before actions subsequent to the access or removal of that element from the collection in another thread.


That seems to guarantee consistency for everything, with the placing of an object into any concurrent collection acting as a universal memory fence. If I understand that correctly, it means that, even though Java doesn't allow volatile arrays, the entire contents of an array are consistent between threads, as of the use of a concurrency collection. So, if one thread sets the contents of an array, then signals another thread that it can use the array by, say, putting the array reference into a BlockingQueue that the second thread is waiting on, the second thread is sure to see the array contents as they were written by the first thread. I'm looking for something similar to use in C/C++ under Windows.

My specific problem is that I have a number of C/C++ functions that will be invoked by Java (via the JNI) on different threads. I can guarantee from the Java side that no two threads will ever by operating on the same buffer, but I need to guarantee that all writing to a buffer by one thread happens-before any reading/writing by the next thread that will have access to that buffer. I could pass the buffer back and forth from C/C++ to and from Java, and that would seem to guarantee consistency, but it would also add needless overhead. Anyone know how to guarantee memory consistency for large buffers in C/C++ under Windows?
 
Saloon Keeper
Posts: 28319
210
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Large buffers, small buffers, single bytes, it's all the same. If you want predictable results when simultaneous users are reading and writing, you need to use interlocks.

In Java, this is done by making access to the object in question synchronized. At a general OS level (for example in C code), there are numerous mechanisms, including semaphores, locks, and in the case of very small objects, there are often specific machine instructions to facilitate safe access.

Note that "buffers" can also mean DMA buffers, where I/O devices share the memory bus with the primary CPUs. Java code generally won't be using DMA buffers directly, though. Regardless, a proper synchronization mechanism would still be required.
 
Stevens Miller
Bartender
Posts: 1464
32
Netbeans IDE C++ Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:...a proper synchronization mechanism would still be required.


Thanks, Tim. I can handle synchronization. I'm getting at a different problem: memory consistency. In Java, if one thread writes to variable myValue, and uses a BlockingQueue to inform another thread that myValue is ready for that second thread to read, the contract defined for BlockingQueues gurantees that the second thread will see what the first thread wrote to myValue, because putting anything into a BlockingQueue guarantees that all actions before that have a happens-before relationship to all actions after that, regardless of which thread is relying on them. Thus, if Thread A synchronizes with Thread B by B waiting for a value to appear in a BlockingQueue, the following pseudo-code works reliably:


Thread B will block until Thread A signals that myValue is ready for Thread B to read, by putting an object in a BlockingQueue. Thread B is blocked until that happens. When it does, Thread B unblocks and will be sure to print "1" (rather than "0") as BlockingQueue's contract guarantees that the assignment of 1 to myValue happened-before Thread B unblocked, so Thread B is sure to see the new value.

What I want to know is: what do I need to do in C++ to get the same happens-before relationship between the writing of a value to a variable in one thread, and the reading of that same variable in another thread. Merely blocking on an inter-thread synchronization mechanism guarantees that I don't have a data race, but it doesn't guarantee memory consistency (because synchronization alone doesn't guarantee that Thread A's cache got written back to the memory from which Thread B will load the value when it reads it, particularly if Thread A is running on a different core than Thread B).

Thanks to our friends at Oracle, the classes in java.util.concurrent all take care of this for me.

What's the comparable mechanism in C++?
 
Tim Holloway
Saloon Keeper
Posts: 28319
210
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No matter how many cores you have, if they're all sharing a memory bus, the memory on that bus is managed relative to the whole, not per-core. Extra hardware exists in modern-day computer systems to ensure that.

There is no standard synchronization mechanism for C/C++ code like there is in Java. For that matter, Java synchronized mode only operates in a single JVM (discounting some of the more experimental JVMs that places like IBM do R&D on).

The low-level conisistency is, as I said, managed by the hardware. The high-level functions are OS-specific. On IBM mainframes, their primary multi-tasking OS's (DOS/360 and OS/360 and descendents) both did high-level synchronization using Supervisor Call functions attached to macros named "ENQ" and "DEQ", but despite the common names, the OS/360 synchronization macros didn't generate the same code as the DOS/360 ones. When multi-processor CPUs came along and a lower-latency synchronization facility was needed, they added locks, then lock hierarchies and I believe there's yet another subsystem in the on the zSeries OS, although since people don't pay me to do OS-level coding for mainframes anymore, I'm not totally informed on that one at the moment.

Edsger Djikstra conceived the absolute minimal OS core (the "T.H.E." OS) using a pair of enqueue/dequeue methods he designated as "P" and "V" (from their Dutch names). Along with a task queue/scheduler, that was, he demonstrated, all that was necessary to fully implement a multi-tasking OS on a single CPU. Probably on multiple cores, too, but that wasn't the problem being explored at the time. An extended version of that architecture became the Exec nucleus of the Amiga OS.

Unix, of course, also developed OS-specific synchronization functions, which I think served as the model for the Java notify/wait functions, but for details I'd have to grab something off my bookshelf. I haven't done one of those in a few years either.

And of course, there's Windows, which suffered from being first DOS, which was uni-tasking, then Windows 3, which could only do true multi-tasking in DOS VMs, then NT and its descendents which tried to unbury themselves from under all that cruft.

There are cases where the normal hardware cache synchronization mechanisms are insufficient. Mostly when you do something like a full VM context switch where control registers are reset and VM pagetable buffers may get purged and general mayhem is in the offing. In such instances, there are special machine instructions to talk to caches directly. But for the most part, trust the hardware.
 
Stevens Miller
Bartender
Posts: 1464
32
Netbeans IDE C++ Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Terrific history lesson.

What is there in C++ under Windows that creates the same happens-before relationship to memory access in one thread compared to another as there is in Java (as described for any of the collections in the java.util.concurrent package)?
 
Ranch Hand
Posts: 165
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stevens Miller wrote:What is there in C++ under Windows that creates the same happens-before relationship to memory access in one thread compared to another as there is in Java (as described for any of the collections in the java.util.concurrent package)?


If I have understood you correctly then the mechanism you are looking for is called a 'condition variable'. In the consumer thread you wait for a signal on a condition variable and in the the producer thread you notify via the same condition variable.

 
Stevens Miller
Bartender
Posts: 1464
32
Netbeans IDE C++ Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Steffe Wilson wrote:If I have understood you correctly then the mechanism you are looking for is called a 'condition variable'.


Thanks, but that's not quite it, Steffe. A condition variable is a synchronization construct, and it does define a happens-before relationship, but it doesn't guarantee that a write to shared memory is readable by the code in the thread that has a happens-after relationship with the code that sets the condition variable. The only way to do that is to be sure that all writes in the code that happens-before the condition variable (which, in my case, is a BlockingQueue, when I use Java) are written through cache memory back to main memory and that any cached copy of that memory previously loaded by the second thread is invalidated before the condition variable is set. This guarantees that, when the condition variable is set, all writes by the first thread that took place before the first thread set the condition variable will be visible to the second thread, after the condition variable is set.

Bjarne Stroustrup addresses this, somewhat, in "The C++ Programming Language, 4th ed.," but he mostly devotes himself to a nearly pathological case (where he deals with bit-packed structures that don't align with word boundaries), and (in a surprisingly lofty tone) addresses shared data thus:

Bjarne Stroustrup wrote:I suspect that after reading about the problems with and techniques for managing shared data, you may become sympathetic to my view that explicitly shared data is best avoided.


Sorry, I can't "avoid" explicitly shared data. I am passing very big buffers around, and each of my different threads needs access to them. I can guarantee that each thread is completely done with a buffer before setting the condition variable that will tell another thread to start working on it, so my code perfectly avoids any data races. What I can't be sure of is that all writes in the first thread are visible to the second thread, because of the caching issue.

Java takes care of this for me. Here's the relevant section from the java.util.concurrent javadoc:

Memory Consistency Properties

Chapter 17 of the Java Language Specification
defines the
happens-before relation on memory operations such as reads and
writes of shared variables. The results of a write by one thread are
guaranteed to be visible to a read by another thread only if the write
operation happens-before the read operation. The
synchronized and volatile constructs, as well as the
Thread.start() and Thread.join() methods, can form
happens-before relationships. In particular:

  • Each action in a thread happens-before every action in that
    thread that comes later in the program's order.

  • An unlock (synchronized block or method exit) of a
    monitor happens-before every subsequent lock (synchronized
    block or method entry) of that same monitor. And because
    the happens-before relation is transitive, all actions
    of a thread prior to unlocking happen-before all actions
    subsequent to any thread locking that monitor.

  • A write to a volatile field happens-before every
    subsequent read of that same field. Writes and reads of
    volatile fields have similar memory consistency effects
    as entering and exiting monitors, but do <em>not</em> entail
    mutual exclusion locking.

  • A call to start on a thread happens-before any
    action in the started thread.

  • All actions in a thread happen-before any other thread
    successfully returns from a join on that thread.

The methods of all classes in java.util.concurrent and its
subpackages extend these guarantees to higher-level
synchronization. In particular:

  • Actions in a thread prior to placing an object into any concurrent
    collection happen-before actions subsequent to the access or
    removal of that element from the collection in another thread.

  • Actions in a thread prior to the submission of a Runnable
    to an Executor happen-before its execution begins.
    Similarly for Callables submitted to an ExecutorService.

  • Actions taken by the asynchronous computation represented by a
    Future happen-before actions subsequent to the
    retrieval of the result via Future.get() in another thread.

  • Actions prior to "releasing" synchronizer methods such as
    Lock.unlock, Semaphore.release, and
    CountDownLatch.countDown happen-before actions
    subsequent to a successful "acquiring" method such as
    Lock.lock, Semaphore.acquire,
    Condition.await, and CountDownLatch.await on the
    same synchronizer object in another thread.

  • For each pair of threads that successfully exchange objects via
    an Exchanger, actions prior to the exchange()
    in each thread happen-before those subsequent to the
    corresponding exchange() in another thread.

  • Actions prior to calling CyclicBarrier.await and
    Phaser.awaitAdvance (as well as its variants)
    happen-before actions performed by the barrier action, and
    actions performed by the barrier action happen-before actions
    subsequent to a successful return from the corresponding await
    in other threads.

I'm looking for something in C++ that meets the same contract created by the first bullet-item in the second part of the above list:

Actions in a thread prior to placing an object into any concurrent
collection happen-before actions subsequent to the access or
removal of that element from the collection in another thread.


Stroustrup says what I'm trying for is "best avoided," but it has to be possible. I don't mind if it's complicated or difficult. But he never really (as far as I can tell) says how you can insure that memory written by one thread will, at some point, be sure to be visible and reflecting those writes to another thread running code that has a happens-after relationship to the code in the first thread that did the writing.

That's what I'm after.
 
Steffe Wilson
Ranch Hand
Posts: 165
12
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stevens Miller wrote:

Steffe Wilson wrote:If I have understood you correctly then the mechanism you are looking for is called a 'condition variable'.


Thanks, but that's not quite it, Steffe. A condition variable is a synchronization construct, and it does define a happens-before relationship, but it doesn't guarantee that a write to shared memory is readable by the code in the thread that has a happens-after relationship with the code that sets the condition variable. The only way to do that is to be sure that all writes in the code that happens-before the condition variable (which, in my case, is a BlockingQueue, when I use Java) are written through cache memory back to main memory and that any cached copy of that memory previously loaded by the second thread is invalidated before the condition variable is set. This guarantees that, when the condition variable is set, all writes by the first thread that took place before the first thread set the condition variable will be visible to the second thread, after the condition variable is set.


A condition variable has to be used within a mutex section (recall that CV.wait takes the mutex as an arg) and taking hold of a mutex applies an implicit memory barrier which flushes the cache.
 
Stevens Miller
Bartender
Posts: 1464
32
Netbeans IDE C++ Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Steffe Wilson wrote:A condition variable has to be used within a mutex section (recall that CV.wait takes the mutex as an arg) and taking hold of a mutex applies an implicit memory barrier which flushes the cache.


Bullseye, Steffe! That's exactly what I was after. Can you point me to some online documentation about this?

And, thanks!
 
Steffe Wilson
Ranch Hand
Posts: 165
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stevens Miller wrote:Can you point me to some online documentation about this?


I don't have any links I can offer I'm afraid, this is just stuff I remember from years ago when I used to do a lot of multiprocessor work with unix C and pthreads.

 
What's gotten into you? Could it be this tiny ad?
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic