• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Hadoop Architecture question

 
neil johnson
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am a newbee to the hadoop. I got confused about who does the splitting of input file. lets assume i have a 200 mb of file and the block size is 64 mb. so we need total of 4 blocks multiplied by the replication factor. who splits the file and how does the split files available to client to be able to write to datanodes.
 
amit punekar
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,
The Hadoop framework, specifically speaking HDFS takes care of this. "Hadoop the definitive guide" has explained this in detail along with a visual image.

Regards,
Amit
 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You might want to download the Hortonworks Sandbox. This gives you an integrated single-node Hadoop installation with tools like Hive, Pig, HCatalog and Hue, plus links to lots of well structured tutorials. The sandbox runs as a virtual machine e.g. inside Virtualbox or VMWare Player, and you can access a lot of the functionality very easily via the browser-based Hue interface. This is a great resource for learning about Hadoop, even if you plan to use a different Hadoop distribution for your project.
 
Rajit kumar
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The splitter job will take care by InputFormat and this can also be controlled by subclassing.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic