Tough problem in general. I once looked at some code somebody wrote purporting to do unit testing on threads, but it just consisted of spawning a bunch of threads and hoping the test would fail if something went wrong. Given the unpredictable nature of thread scheduling (particularly for code that could be ported to environments with different approaches to thread scheduling), it is challenging to craft a robust testing harness.
Thread safety is often more tractable when approached analytically instead of empirically. In practical terms, you write code using techniques that help you to avoid threading problems. The Doug Lea
java book is a good source of background on this.
The problem with the empirical (testing) approach is simply one of numbers. For tests to help you they have to exercise a known path through the code you are concerned about, and the total sum of all your test cases hopefully gives you coverage (or at least reasonable coverage) of your code. Threading causes an exponential explosion in the number of paths that need testing, and it can be very difficult to force a test to exercise a specific path (e.g. given two clients of a web app, you don't get to force how the intermediate steps of their request will be processed by the JVM).