I have an example in the book that deals with that, taken from a similar example in "Java 8 in Action" (which is being revised to be
Java 8 and 9 in Action). In my example, I use the JMH profiling tool to evaluate the performance of methods that sum the first 10 million long values.
In one method, I simply add the longs in a loop. In another, I use a LongStream with a range and a sum method. Then I make that (sequential by default) stream parallel. Then I do the sum in the most inefficient way possible, by using a Stream<Long> with an iterate method and a reduce, and finally I do the same thing in parallel.
The results are that the sequential LongStream sum is even faster than the simple loop, though the difference is speed is probably not significant. Making it parallel doesn't help, but mostly because summing primitives is, as the kids used to say, wicked fast already. By contrast, summing the Stream<Long> is much, much slower, on the order of about 10 to 15 times. Making that one parallel actually makes the performance worse, mostly because using iterate with a limit is not an easy structure for the system to partition. The bottom line is that as long as you don't do something silly, like using a stream of wrapped values where a primitive stream is available, the performance is about the same as a regular loop. So go ahead and use streams to write your code, and then you can experiment with parallelization afterwards.
When is parallelization worth it? The general rules are: you need a stateless, associative operation (like addition), you need either a lot of data or a process that takes a lot of time on each element, and you need a source of data that is easy to partition. If those conditions apply, you're likely to see a benefit that exceeds the cost of splitting the work and joining all the individual results together again.
Trisha Gee (and if you don't know that name, look her up -- she's awesome) has published some studies that show similar results. As Brian Goetz likes to say, parallelization is an optimization. Get your code working sequentially first, and then see what you can do in parallel. The streams in Java 8 have been optimized enough to make it worthwhile to write your code that way and then try to optimize.