• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Tim Cooke
  • Devaka Cooray
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
Bartenders:
  • Carey Brown
  • Roland Mueller

Which Big Data technologies does Hadoop comprise of?

 
Ranch Hand
Posts: 2938
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There are several big data technologies like Map Reduce, Apache Spark, Hive,Cassandra,HBase etc. Which all technologies come under hadoop. Why does hbase come under hadoop but cassandra does not?

thanks
 
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
HBase is built on HDFS so technically over existing Hadoop technologies, Cassandra is not !!
 
Ranch Hand
Posts: 82
1
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello
Cassandra is a write-intensive database. Its write performance is higher than most other Nosql dbs. Cassandra follows a peer to peer architecture, as opposed to master-slave architecture of MongoDB and most RDBMS. That means you can write to any peer and Cassandra will take care of data synchronization. That's why its faster. Having said that Cassandra has some shortcomings when it comes to querying data, hence data modeling is the most important part of using Cassandra well. To enable the fast read/write, Cassandra allows you to query only by its primary keys. The partition key enables segregating data into partitions. So Cassandra can determine which partition to look for your data by the partition key. The clustering key keeps the data stored in the tables in sorted order. I am not aware if you can do custom sorting on any field in Cassandra. Of course, you can create secondary indexes on fields other than Primary keys, to query by them, but the moment you do that you degrade performance drastically. All this makes data modeling quite a challenge in Cassandra. Often if you modeled according to a certain requirement, and later when a new requirement comes along that means you need to change the data model again. Cassandra also has a steeper learning curve compared to MongoDB.
The best tool
Apache Spark
Often used as a framework for building analytic tools on top of, Spark is an open-source processing engine that is built for speed, ease of use and sophisticated analytics.

A huge amount of backing is being given to Spark, with over 750 contributors from over 200 organizations aiming to develop on it and advance it.

A number of companies such as Hortonworks and IBM have all been busy integrating Spark capabilities into their big data platforms, and it could be set to become the default analytics power for Hadoop.


I hope this will help to you
 
Monica Shiralkar
Ranch Hand
Posts: 2938
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ishan Shah wrote:and it could be set to become the default analytics power for Hadoop



Thanks
However ,Cassandra and Spark are not part of hadoop ecosystem.

What do you mean by "(spark) can become default analytics power for Hadoop"?
 
Do not meddle in the affairs of dragons - for you are crunchy and good with ketchup. Crunchy tiny ad:
We need your help - Coderanch server fundraiser
https://coderanch.com/wiki/782867/Coderanch-server-fundraiser
reply
    Bookmark Topic Watch Topic
  • New Topic