In essentially all modern JVM implementations, each
Java "Thread" object corresponds to a separate scheduling primitive at the operating system level. If the operating system knows how to schedule threads on multiple cores, then that JVM implementation will automatically take advantage of the multiple cores when running multithreaded code. This is not just a Java 5 or 6 thing; it's been true since before consumer multicore processors existed. It's true on multiprocessor motherboards, too.
Now, for a given Java
program to really take advantage of multiple cores, the program has to explicitly distribute the work it does across multiple threads. If you sort a big array on one
thread, then that won't take advantage of multiple cores. What that article says, basically, is that if you divide up the work of sorting across two threads, then you can nearly double the speed of the sort. Of course, there are other things to consider besides raw speed, like GUI responsiveness, and hogging both the cores won't help with that!
Although his implementation may use APIs that are new to JDK 1.6, you could have done the exact same thing using just the APIs in JDK 1.1, and get the same speedup. I say 1.1 only because 1.0 JVMs didn't generally let the OS schedule the threads.