• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How good is Mondrian in scaling up more and more data and where does fit in the Big data platform ?

 
Anujit Chatterjee
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

What are tools / techniques Mondrian is using to scale up like caches , etc ?
And where does it fit into the Big Data platform especially with respect to Hadoop, Hive ,etc ?
How easy it is to plug Mondrian with other Big data tools ?

Regards,
Anujit
 
Bill Back
Author
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'll answer in two parts. First, the scaling. Mondrian has two general approaches to scaling (chapter 7). The first is using aggregate tables. These are tables that pre-aggregate the data. For example, suppose you are storing facts about sales at the hourly level, but you usually just do analysis at the daily or weekly level. You can create an aggregate table that is used at those levels. This reduces the data being returned.

The second technique is caching. Mondrian caches schema, members, and segments (the things that make up an aggregate). This means that once the data has been queried it is stored in memory. Additionally, Mondrian support external caches, such as Infinispan, that allow very large amounts of data to be stored in memory with persistence and failover.
 
Nicholas Goodman
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'll tack on the response to Hadoop/Hive. We cover how Mondrian fits in with Big Data systems in Chapter 11. In that chapter we note that Mondrian has experimental Hive support. However, given the latency of the most basic Hive queries (for generating the list of values for the "year" column) the overall performance will always be lackluster for direct access with a engine like Mondrian. The work of Impala, Drill, etc will improve this (making simple queries fast, and longer queries longer) over time.
 
Anujit Chatterjee
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Thanks Bill . But I am now interested to know more about how level based , on-demand structure works . I ask this because I have faced situations in BI reporting where this was the structure that was required but was not there.

And Nicholas thanks for touching the latency issue. I am not aware of Impala but am eager to see how Mondrian plugs in with Drill.

Thanks a lot.

Regards,
Anujit
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic