Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Hadoop/HDFS and HBase - difference/relation

 
Akhilesh Trivedi
Ranch Hand
Posts: 1608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How is Hadoop/HDFS related to HBase?

Do we ever load HDFS data into HBase ? Where are HBase tables stored?

Is it that HBase is something that is completely away from HDFS and just that it is NoSQL and name starts with an 'H' or is there any really related to Hadoop/HDFS with it?
 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
HDFS is the Hadoop Distributed File System. HBase is a NoSQL database that is implemented on top of HDFS and Hadoop. According to IBM:

HBase is a column-oriented database management system that runs on top of HDFS. It is well suited for sparse data sets, which are common in many big data use cases. Unlike relational database systems, HBase does not support a structured query language like SQL; in fact, HBase isn’t a relational data store at all. HBase applications are written in Java much like a typical MapReduce application. HBase does support writing applications in Avro, REST, and Thrift.

An HBase system comprises a set of tables. Each table contains rows and columns, much like a traditional database. Each table must have an element defined as a Primary Key, and all access attempts to HBase tables must use this Primary Key. An HBase column represents an attribute of an object; for example, if the table is storing diagnostic logs from servers in your environment, where each row might be a log record, a typical column in such a table would be the timestamp of when the log record was written, or perhaps the server name where the record originated. In fact, HBase allows for many attributes to be grouped together into what are known as column families, such that the elements of a column family are all stored together. This is different from a row-oriented relational database, where all the columns of a given row are stored together. With HBase you must predefine the table schema and specify the column families. However, it’s very flexible in that new columns can be added to families at any time, making the schema flexible and therefore able to adapt to changing application requirements.

If you Google "HBase tutorial" I'm sure you will find plenty of material to help you find out more about HBase e.g. using one of the free sandbox Hadoop distributions from Hortonworks or Cloudera.
 
Akhilesh Trivedi
Ranch Hand
Posts: 1608
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Chris!
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic