• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Rob Spoor
  • Junilu Lacar
  • paul wheaton
Saloon Keepers:
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
  • Scott Selikoff
Bartenders:
  • Piet Souris
  • Jj Roberts
  • fred rosenberger

Are columnar DBs preferred to document DBs in cases demanding fast retrieval but not much querying?

 
Ranch Hand
Posts: 2601
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In columnar databases like HBase,we store data using rowkeys and using these rowkeys we are going to retrieve the data. In document oriented databases like MongoDb we have more querying capability and we stores JSONs as it is. Are columnar databases preferred to document oriented in cases where where we want fast retrieval but not much querying.Thanks.
 
Ranch Hand
Posts: 32
3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Essentially you are correct: your choice of NoSQL database should be influenced by your requirements for writing and querying your data.  Do you want really fast writes, really fast queries, or really flexible queries? You cannot always achieve all of these with a single approach.

I haven't really worked with HBase, but I have done a bit of work with Cassandra, which is also based on the column-family model. Many of the basic principles are similar.  

Cassandra is designed for very fast writes, because the physical location for the data is determined by its key (as in HBase), so the database can write the data to the correct node very quickly. There are lots of other internal optimisations to make writes as fast as possible (compaction etc).

In Cassandra, the "primary key" consists of the partition key, which defines the physical location of the partition, and the clustering key, which defines how the data are ordered within the partition.

Cassandra also has a SQL-like query language (CQL), which is very powerful, provided you understand the limitations of the data model.

Query performance is based on how you use the key in your query.  

You have to provide the partition key in order to query data on Cassandra, as this tells Cassandra which node to look at for your data, so it's important to design your data model to reflect how you expect to query your data.  Sometimes, the easiest approach is to store your data in multiple tables with different keys, so you can query the data easily by different keys later on.  You can query by non-key fields in Cassandra, provided you provide the partition key as well: Cassandra will locate the data via the partition key, then filter the results based on the additional query criteria.  

You can also combine Cassandra as your fast data-store with ElasticSearch as a query indexing engine in Elassandra which tries to give you the best of both worlds.

You can find out more about Cassandra data-modelling here if you're interested.

Query performance on other databases depends on the basic data model, possible indexing mechanisms, and so on. For example, MongoDB offers indexes to help improve query performance, similarly to RDBMS indexes. But having lots of indexes means the indexes have to be updated when you write to a collection (MongoDB) or table (SQL), so there is always a trade off.
 
Monica Shiralkar
Ranch Hand
Posts: 2601
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks

Christopher Webster wrote:

Cassandra also has a SQL-like query language (CQL), which is very powerful, provided you understand the limitations of the data model.



But when Cassandra has this SQL like CQL (and one can have SQL layer Phoenix over HBase), so what is the advantage one may get of using Documented Oriented Database when such benefits one can get in Columnar databases too like Cassandra (using CQL)?
 
pie. tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic