Forums Register Login

How good is Mondrian in scaling up more and more data and where does fit in the Big data platform ?

+Pie Number of slices to send: Send
Hi,

What are tools / techniques Mondrian is using to scale up like caches , etc ?
And where does it fit into the Big Data platform especially with respect to Hadoop, Hive ,etc ?
How easy it is to plug Mondrian with other Big data tools ?

Regards,
Anujit
+Pie Number of slices to send: Send
I'll answer in two parts. First, the scaling. Mondrian has two general approaches to scaling (chapter 7). The first is using aggregate tables. These are tables that pre-aggregate the data. For example, suppose you are storing facts about sales at the hourly level, but you usually just do analysis at the daily or weekly level. You can create an aggregate table that is used at those levels. This reduces the data being returned.

The second technique is caching. Mondrian caches schema, members, and segments (the things that make up an aggregate). This means that once the data has been queried it is stored in memory. Additionally, Mondrian support external caches, such as Infinispan, that allow very large amounts of data to be stored in memory with persistence and failover.
+Pie Number of slices to send: Send
I'll tack on the response to Hadoop/Hive. We cover how Mondrian fits in with Big Data systems in Chapter 11. In that chapter we note that Mondrian has experimental Hive support. However, given the latency of the most basic Hive queries (for generating the list of values for the "year" column) the overall performance will always be lackluster for direct access with a engine like Mondrian. The work of Impala, Drill, etc will improve this (making simple queries fast, and longer queries longer) over time.
+Pie Number of slices to send: Send

Thanks Bill . But I am now interested to know more about how level based , on-demand structure works . I ask this because I have faced situations in BI reporting where this was the structure that was required but was not there.

And Nicholas thanks for touching the latency issue. I am not aware of Impala but am eager to see how Mondrian plugs in with Drill.

Thanks a lot.

Regards,
Anujit
We're all out of roofs. But we still have tiny ads:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com


reply
reply
This thread has been viewed 2098 times.
Similar Threads
What is Mondrian?
The performance of MongoDB with "Big Data".
Top 10 Technical requirements for In-Memory Reporting
Java discourages intelligent use of database technology: Discuss.
Keeping 4 lakhs db records data into a array list of plain java bean or VO objects
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 29, 2024 02:13:20.