This week's book giveaway is in the Python forum.
We're giving away four copies of High Performance Python for Data Analytics and have Tiago Rodrigues Antao on-line!
See this thread for details.
Win a copy of High Performance Python for Data Analytics this week in the Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Jj Roberts
  • Carey Brown
Bartenders:
  • salvin francis
  • Frits Walraven
  • Piet Souris

While creating EMR cluster, when hadoop is selected is it mainly because of Hive

 
Ranch Foreman
Posts: 2348
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
While creating EMR cluster, we can select the services which we want to use such as Spark, Hadoop etc. Hadoop Map reduce is not used much nowadays or new projects and Spark is preferred. HDFS would not be required as EMR would use S3/EMRFS. So why would one choose Hadoop while creating an EMR cluster if we are already choosing Spark. Is it mainly because of Hive which is a component of Hadoop ecosystem used for adhoc analysis.
Thanks.
 
Wink, wink, nudge, nudge, say no more, it's a tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic