File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Hadoop and the fly likes Difficulty ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Difficulty ?" Watch "Difficulty ?" New topic
Author

Difficulty ?

paul nisset
Ranch Hand

Joined: May 13, 2009
Posts: 177
Hi,
I don't actually work with Big Data but have a feeling that it is something that will be coming my way at some point.
How difficult is it to learn, and become competent, in Hadoop as compared with other NoSql technologies ?

Thanks,
Paul
Alex Holmes
Author
Greenhorn

Joined: Oct 19, 2012
Posts: 21
Hi,

There is definitely a learning curve with Hadoop, which is probably higher than most NoSQL systems, which are more aligned with other real-time systems that we are all accustomed to working with (such as relational databases). The additional learning time is really related to installation, management and understanding MapReduce as a framework and programming model.

Having said that, I would argue that it's worthwhile understanding the Hadoop fundamentals; even if you don't end up using the technology, it will help you understand the MapReduce concepts, which are also being leveraged in-system by NoSQL solutions. Hadoop's emphasis on data locality is also a valuable lesson that we all should be aware about as general good-practice distributed system design, which helps reenforce our own architectural and design decisions.

Thanks,
Alex

Author, Hadoop in Practice, http://www.manning.com/holmes/
Blog at http://grepalex.com/
paul nisset
Ranch Hand

Joined: May 13, 2009
Posts: 177
Thanks Alex.
Mohamed El-Refaey
Ranch Hand

Joined: Dec 08, 2009
Posts: 119
Alex Holmes wrote:Hi,

There is definitely a learning curve with Hadoop, which is probably higher than most NoSQL systems, which are more aligned with other real-time systems that we are all accustomed to working with (such as relational databases). The additional learning time is really related to installation, management and understanding MapReduce as a framework and programming model.

Having said that, I would argue that it's worthwhile understanding the Hadoop fundamentals; even if you don't end up using the technology, it will help you understand the MapReduce concepts, which are also being leveraged in-system by NoSQL solutions. Hadoop's emphasis on data locality is also a valuable lesson that we all should be aware about as general good-practice distributed system design, which helps reenforce our own architectural and design decisions.

Thanks,
Alex


Alex, can you please elaborate more about what you mean by "Hadoop's emphasis on data locality is also a valuable lesson"

Regards,
Mohamed


Best Regards, Mohamed El-Refaey
www.egyptcloudforum.com
Alex Holmes
Author
Greenhorn

Joined: Oct 19, 2012
Posts: 21
Mohamed,

In distributed computing it is much preferred to read data from local disk, rather than over the network. This is also known as data locality, and is one of the key aspects of Hadoop. When MapReduce pushes work to the slave nodes, it does do in a way to favor reads from disk rather than reads from the network.

Thanks,
Alex
Mohamed El-Refaey
Ranch Hand

Joined: Dec 08, 2009
Posts: 119
Alex Holmes wrote:Mohamed,

In distributed computing it is much preferred to read data from local disk, rather than over the network. This is also known as data locality, and is one of the key aspects of Hadoop. When MapReduce pushes work to the slave nodes, it does do in a way to favor reads from disk rather than reads from the network.

Thanks,
Alex


Thanks Alex for your clarifications! much appreciated.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Difficulty ?