I am new to Hadoop but have in depth knowledge of Java and Linux. Considering learning whole Hadoop as a mammoth task, I would like to understand which areas of Hadoop i should concentrate on where i can use my existing knowledge of Java and Linux. Which areas are mandatory and i have to learn ?
Don't even try to learn the "whole of Hadoop" - these days Hadoop is really a huge collection of open-source projects and you can't learn them all. In fact, I would say don't even try to install the individual packages, because there are lots of mutual incompatibilities and installing Hadoop is just a world of pain. Instead, go for a bundled Hadoop distribution which you can download for free from a provider such as Cloudera or Hortonworks. These companies offer a bundle of popular Hadoop-based tools, pre-installed and configured, which you can download as a "sandbox" VM and run in VirtualBox or VMWarePlayer.
If you go for Cloudera, then you might like to try Udacity's online course Intro to Hadoop and MapReduce which allows free access to the course materials so you can work through it on your own. I think this course uses a VM based on the free Cloudera Express bundle.
Alternatively, download the Hortonworks Sandbox which is another free Hadoop bundle in a VM, but also includes lots of introductory tutorials to help you get started with Hadoop.
Work through the basic tutorials e.g. using core tools like HDFS, Hue, Hive, Pig. Then when you understand a bit about Hadoop, look at application coding e.g. using Java. But bear in mind that writing pure Java MapReduce programs is no longer the preferred approach to coding for Hadoop. There are lots of higher-level libraries, such as Cascading, or tools such as Cloudera's Impala SQL engine, which are designed to make it easier to code your business logic at a more abstract level instead of having to break everything down into MapReduce steps which are hard to write and often do not perform particularly well on larger processes.
And if you want to go beyond MapReduce and see the current state of the art, have a look at Apache Spark with Python/Scala/Java, which is a high performance distributed computing engine that runs stand-alone e.g. on your local PC or a cluster, or on top of Hadoop's YARN engine.
No more Blub for me, thank you, Vicar.
He repaced his skull with glass. So you can see his brain. Kinda like this tiny ad:
Two software engineers solve most of the world's problems in one K&R sized book