I'm currently writing a promela description of a CLH-based RW-lock that supports aborting arbitrary members in the queue. So proving correctness shouldn't be (too much of) an issue.
But I'll have to implement this in Java, obviously, (7, as it happens, because that's what the project supports), and I have no idea how to micro-benchmark such an implementation for comparison with other synchronisation options.
jmh is pretty popular for benchmarking. I have used it a number of times at work for a very very very performance sensitive application. HdrHistogram is also a valuable measurement tool for performance sensitive applications, however I have had less experience in using this one.