Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Thread safe XML package?  RSS feed

 
Warren Dew
blacksmith
Ranch Hand
Posts: 1332
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm currently porting an multithreaded application to Java. The application uses a thread safe XML package to ensure thread safe communication between threads. Since we are doing a "straight port", that means I need to find a thread safe XML package for Java, if possible. Are any of the XML packages available in Java thread safe? Which ones?
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Interesting question. Guess I've never thought about it. What parts of Java XML are not safe? Multiple threads modifying a DOM?
 
Warren Dew
blacksmith
Ranch Hand
Posts: 1332
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Multiple threads modifying a single DOM document is exactly how this application manages interthread communication.

I don't actually know whether that's thread safe in the various Java DOM implementations; I haven't been able to find any definitive documentation either way. That's why I'm asking - I don't want to assume thread safety, then be proven wrong only after the application comes under load.

The alternative is to use some other method to ensure thread safety. That's a bit of a departure from the 'straight port' edict, but we have an exception 'where Java requires a different solution'.
[ April 26, 2005: Message edited by: Warren Dew ]
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Fascinating (I'm wearing Spock ears.) You might have to synchronize DOM access or serialize it through queueing or something more clever than I'd come up with.
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
First, I doubt the major DOM libraries are thread safe for one reason: 999 times out of 32 applications require only single-threaded access to a single DOM. Adding synchronization up front "just in case" (see Vector and Hashtable) incurs a huge penalty for most applications.

Testing any particular library shouldn't be all that hard, however. Step one is to check the docs (as you've done). You can probably assume that not saying anything means not thread-safe, but that might narrow the field to zero.

Step two is to grep the source for "synchronized?", if you have access to it. Yes, it's possible the libary uses Doug Lea's Concurrent package, java.util.concurrent, or some other third party locking library, but I suspect the first case (Doug Lea's package) is the only likely candidate. I know of no other standard and tested locking library, and I doubt any XML library would limit itself to JDK 1.5 just yet.

Step three is to roll up your sleaves and write a test. Create a lot of threads; you want high contention to increase the chances of a thread getting interrupted in a block of code that would cause breakage if not thread-safe. My first thought is to perform random operations on a single DOM: add elements and add/modify parameters.

I leave out deletion because I suspect the library could be thread-safe yet still allow race conditions for removing nodes. For example, thread A adds a child to node X while thread B deletes node X. The library may handle this correctly, but it might be hard to verify the operations at the end.

The trick is to choose random operations that can be easily verified in the DOM at the end. One way is to model the operations and append them to a List (one per thread and then aggregated, or one synchronized list for all threads). This seems rather complex up front and may still be difficult to verify.

[ Added: A possible variation on that is to serially apply the same operations to a second DOM using synchronization. At the end, compare the two DOMs. ]

Another way is to use Strings that are numbers. For an element's name or an attribute's value, use "<thread-name>-<counter>" with counter starting at 1. Including the thread's name might be problematic; it's just a first guess. When you modify a name or value, extract the counter, increment it, recombine it, and put it back into the DOM; also increment a shared counter.* At the end, add up all the counters in the DOM and compare it to the shared counter.

* This presents a problem if two threads are modifying the same attribute. While the library may be thread-safe, it's still possible for the counter to be incremented only once:The problem arrises anytime step 1 happens before step 2. Tiger's atomic classes solve this problem for primitives with the compareAndSet(oldValue, newValue), but even that wouldn't help you here.

I guess you could use synchronized blocks on the Attribute or Element itself for these operations to implement your own atomic set without significantly affecting the test.

[ Added: The more I think about it, the key seems to be choosing random operations that are somewhat transactional. You need to be able to serialize them and expect the same results. An operation like "choose a random element and insert a child element randomly among its children" breaks down because the timing of when the elements are chosen with respect to other operations makes it unserializable. Hah! My brain is spinning. I need to refamiliarize myself with the DOM API before I can speak intelligently about it. ]

This sounds like a very interesting challenge and would make a nice addition to the test suite of any DOM library claiming to be thread-safe. Spock indeed!
[ April 28, 2005: Message edited by: David Harkness ]
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was just skimming the docs for NASA's Java PathFinder (Adeel just posted a link to the press release), and the end of the "What can be checked by JPF?" page makes it sound like a perfect testbed for this.
JPF so far has been mainly used for applications that are models, but require a full procedural programming language. JPF is especially useful to verify concurrent Java programs, due to its systematic exploration of scheduling sequences.
While it has some serious limitations -- especially with regard to native code (since it's a minimal VM itself) -- it looks like they wouldn't hinder testing whether or not a DOM API is thread-safe. As well, this kind of test sounds to me like exactly what JPF was designed to test.
 
Warren Dew
blacksmith
Ranch Hand
Posts: 1332
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Harkness:

Testing any particular library shouldn't be all that hard, however. Step one is to check the docs (as you've done). You can probably assume that not saying anything means not thread-safe, but that might narrow the field to zero.

That's pretty much what I've concluded.

Testing them to make sure or writing a thread safe wrapper or version might be a nice open source project, but we don't really have the schedule to do it within this project.
 
David Harkness
Ranch Hand
Posts: 1646
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think your best bet is to analyze the code and discern the access pattern the various threads use. Are they modifying the DOM all over the place, or is there a small set of discreet operations they each perform? If you can figure out that pattern, grafting on synchroniztion at the Element level might not be as horrific as it first would seem.

As an analogy, if many threads concurrently modify a single List, your first inclination would be to assume you have to serialize all access to it. However, if in analyzing the code you find that all the threads are treating the list as a queue -- adding elements on one end and removing them from the other -- you can improve concurrency by having a lock for each end.

Granted, I doubt you'll find it simplifies quite to that degree, but there must be some method to the madness that could help you.
 
Warren Dew
blacksmith
Ranch Hand
Posts: 1332
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
David Harkness:

As an analogy, if many threads concurrently modify a single List, your first inclination would be to assume you have to serialize all access to it. However, if in analyzing the code you find that all the threads are treating the list as a queue -- adding elements on one end and removing them from the other -- you can improve concurrency by having a lock for each end.

Except that you then run into problems if the queue gets short, and you get contention between the two ends, if they have separate locks. It's that kind of problem that makes a patchwork approach dangerous - it can be much more difficult to make a patchwork approach work than to synchronize the entire library.

In your example, it probably makes the most sense to actually use a queue. In my case, I think it probably makes the most sense to use some other form of synchronization. The actual threading model is fairly straightforward - it's just the use of a specific XML library for synchronization that makes the code complex.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!