Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Mapreduce example from Apache Site

 
Akhilesh Trivedi
Ranch Hand
Posts: 1608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was following up on this page from Apache.

After the compilation step of word count v1.0 it says


Assuming that:

/user/joe/wordcount/input - input directory in HDFS
/user/joe/wordcount/output - output directory in HDFS



What does directory in HDFS mean? Are these already created? and I see that



lists the two files inside input directory. Even the normal "ls" command would have done that, what is the significance of using bin/hdfs here?
 
arumugarani sundaram
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Please try to understand HDFS is distributed file system. If you design the system as clustered ones, the data will be split into multiple segments/chunks and distributed across clustered environment. bin/hadoop dfs ---------> it means that you are listing from HDFS not from an ordinary file system.

Hope you understand this.

The input will say that where the input files are available for processing and the output says where the processed output files are available.

Think of a file that contains the phone number for everyone in the country X; the people with a last name starting with A might be stored on server 1, B on server 2, and so on. In a Hadoop world, pieces of this phonebook would be stored across the cluster. To achieve availability as components fail, HDFS replicates these smaller pieces onto two additional servers by default.This redundancy offers multiple benefits, the most obvious being higher availability. When you query the HDFS, the data from clustered servers will be combined and re-constructed as a single one.

Hope this helps you to understand.

Thanks,
Arumugarani
 
Akhilesh Trivedi
Ranch Hand
Posts: 1608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Arumugarani!

I am able to understand the concepts and working through.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic