• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Junilu Lacar
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Jj Roberts
  • Tim Holloway
  • Piet Souris
  • Himai Minh
  • Carey Brown
  • salvin francis

Are columnar DBs preferred to document DBs in cases demanding fast retrieval but not much querying?

Ranch Foreman
Posts: 2060
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In columnar databases like HBase,we store data using rowkeys and using these rowkeys we are going to retrieve the data. In document oriented databases like MongoDb we have more querying capability and we stores JSONs as it is. Are columnar databases preferred to document oriented in cases where where we want fast retrieval but not much querying.Thanks.
Ranch Foreman
Posts: 28
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Essentially you are correct: your choice of NoSQL database should be influenced by your requirements for writing and querying your data.  Do you want really fast writes, really fast queries, or really flexible queries? You cannot always achieve all of these with a single approach.

I haven't really worked with HBase, but I have done a bit of work with Cassandra, which is also based on the column-family model. Many of the basic principles are similar.  

Cassandra is designed for very fast writes, because the physical location for the data is determined by its key (as in HBase), so the database can write the data to the correct node very quickly. There are lots of other internal optimisations to make writes as fast as possible (compaction etc).

In Cassandra, the "primary key" consists of the partition key, which defines the physical location of the partition, and the clustering key, which defines how the data are ordered within the partition.

Cassandra also has a SQL-like query language (CQL), which is very powerful, provided you understand the limitations of the data model.

Query performance is based on how you use the key in your query.  

You have to provide the partition key in order to query data on Cassandra, as this tells Cassandra which node to look at for your data, so it's important to design your data model to reflect how you expect to query your data.  Sometimes, the easiest approach is to store your data in multiple tables with different keys, so you can query the data easily by different keys later on.  You can query by non-key fields in Cassandra, provided you provide the partition key as well: Cassandra will locate the data via the partition key, then filter the results based on the additional query criteria.  

You can also combine Cassandra as your fast data-store with ElasticSearch as a query indexing engine in Elassandra which tries to give you the best of both worlds.

You can find out more about Cassandra data-modelling here if you're interested.

Query performance on other databases depends on the basic data model, possible indexing mechanisms, and so on. For example, MongoDB offers indexes to help improve query performance, similarly to RDBMS indexes. But having lots of indexes means the indexes have to be updated when you write to a collection (MongoDB) or table (SQL), so there is always a trade off.
Of course, I found a very beautiful couch. Definitely. And this tiny ad:
Thread Boost feature
    Bookmark Topic Watch Topic
  • New Topic