• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

Why Hadoop needs its own file system?

 
clojure forum advocate
Posts: 3479
Mac Objective C Clojure
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Chuck,
Why Hadoop needs its own file system (HDFS)? Why a Unix/Linux file system can't be used?
Thanks.
 
Ranch Hand
Posts: 47
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hadoop provides many interfaces to its filesystems, and it generally uses the URI scheme to pick the correct filesystem instance to communicate with.
Although it is possible (and sometimes very convenient) to run MapReduce programs that access any of these filesystems, when you are processing large volumes of data, you should choose a distributed filesystem that has the data locality optimization, such as HDFS or KFS.

If you opt to loose data locality optimization, still is the requirement to use a shared filesystem, that each cluster member should see a single filesystem.

The MapReduce filosophy differs from Neumann model's computing exactly from this perspective, that thinking in MapReduce you have to forget the individual nodes which would contain different filesystems, at the end result burdening the architecture in thinking to desing "which data from where you can access and let's desing the transfer of data too". MapReduce should be viewed as a One entity, thus is very important to use such a shared filesystem.


 
Ranch Hand
Posts: 67
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Because Hadoop works in distributed environment and it needs to have all machines to be represented as single unit, an ability that HDFS provides.
 
Ranch Hand
Posts: 161
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry if this is an ignorant question..

could you use something like Terracotta's EhCache in place of HDFS??
 
Hussein Baghdadi
clojure forum advocate
Posts: 3479
Mac Objective C Clojure
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Lanny Gilbert wrote:Sorry if this is an ignorant question..

could you use something like Terracotta's EhCache in place of HDFS??


Terracotta EHCache is a distributed caching software, how it is related to file systems?
 
If I had asked people what they wanted, they would have said faster horses - Ford. Tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
reply
    Bookmark Topic Watch Topic
  • New Topic