• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Is Hadoop ready for the Enterprise?

 
Ranch Hand
Posts: 221
Scala Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hadoop has been around as an open source project for barely 7 years.

Would you recommend a customer (Let's say an Investment Bank in Wall Street) who needs to run a mission critical application any specific Hadoop distribution?
Why?

Is this Hadoop distribution capable of NameNode HA, JobTracker HA, Volumes, Snapshots, Mirrors and any other features important for disaster recovery?

In my view ease of use, ease to make the Data Ingestion in the Hadoop cluster filesystem are important critical features to have.
 
author
Posts: 15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I really tend to avoid recommending a particular distribution as I think they all have a place. But if you take your list of ideal requirements then it's clear that given the current state of the underlying Apache projects that MapR is probably the best fit.

Let's be honest, prior to Hadoop 2.0 HA (particularly for the NameNode) has always been compromised to a degree. The system is near bullet-proof when most things fail, but have your NN go down and you are in trouble. Hadoop 2.0 improves that greatly and it'll be pretty cool to have all the major distributions offering out-of-the-box HA for both NN and JT.

But I'd also caution that DR is absolutely more than the choice of distribution and I think you touch on that. Whatever setup you choose fate will always find a failure scenario that causes some sort of operational crises. Lightning strikes are particularly good at highlighting these. And if you do need things like complete cross-site redundancy I suspect you'll end up building sufficient plumbing to make it all work that the choice of distribution and particular features is less relevant.

I think it's true to say that this sort of high-end cross-site DR is another area that Hadoop will continue to mature in but I'd also say that given previous experiences trying to implement other technologies that supposedly do have that level of DR are never as simple as the vendor says and this sort of thing is just fundamentally hard.

Garry
 
I can't renounce my name. It's on all my stationery! And hinted in this tiny ad:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic