Let's get Scala out of the way first. Unlike the other three which are data processing technologies, Scala is a general purpose programming language.
You don't need to learn Scala to work with any of those technologies.
That said, Scala is still an interesting language with unique concepts and approaches, and I think you should learn it just to expand your mind to different possibilities.
Storm is also a bit of a different beast. Its focus is on near real time, low latency processing of streaming data as it comes in.
Certain kinds of data should be processed immediately.
This is in contrast to batch processing where data is first stored somewhere and then processed later in bulk.
For example, a meteorologist wants to know right now the probability of a storm (no pun intended!) coming in based on current weather sensor readings.
She can't wait 24 hours to collect data and then bulk process it, because then it'll be too late.
That leaves us with Hadoop vs Spark as batch processing solutions. Both do have near real time streaming solutions of their own (with Spark being better at it), but it's not their focus.
The thing is, there is no "vs." at all here.
Because Hadoop is a big ecosystem with components like HDFS for storage and YARN for cluster resource allocation which are used by Spark deployments.
So you can't use Spark in enterprise without also using some components of the Hadoop ecosystem.
Where Spark excels is performance, simply because it aggregates intermediate results in memory instead of on disk like Hadoop.
I've explained what their individual focus is. Now it's upto you to decide what exactly are your data processing goals and then choose the tools.
Much more important than learning the tools is learning data processing algorithms like clustering and prediction.
If you have no specific goal and you have enough time, learn all of them.
She's out of the country right now, toppling an unauthorized dictatorship. Please leave a message with this tiny ad: