Win a copy of Hello World this week in the Python forum!

Bill Back

Author
+ Follow
since Aug 09, 2013
Spokane Valley, WA
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Bill Back

It is accessible. We even gave a couple of the early chapters to the sales guys to be able to explain Mondrian. ;-)

We start from the basics, explaining what Mondrian is, where it fits in the technical landscape, and what it's good for. A non-technical person might not be as interested in some of the more technical, later chapters, but well over half the book is very useful for what and how Mondrian works.
6 years ago
The full list is available at http://infocenter.pentaho.com/help/index.jsp?topic=%2Fsupported_components%2Freference_supported_components.html. There are some techniques using Pentaho Data Integration for going directly against NoSQL data sources, such as Cassandra and MongoDB. But keep in mind the performance is usually lower than using a database, particularly an analytics (columnar) database.

I know there is also a lot of work going on with Optiq, Julian's other project, to provide such access. Perhaps he can provide more info.
6 years ago
You can, but you'd have to pull them into a data mart via ETL. The latest version of Pentaho, 5.0, has the ability to do data blending against multiple data sources, but the performance isn't what you normally expect from an OLAP tool. However, it is very good for exploratory questions.
6 years ago
I would check out chapter 2 when you get a chance for details about what you can do with Mondrian (or buy the book!). http://www.manning-source.com/books/back/MiAch02sample.pdf

In a nutshell, Mondrian is an engine for doing multi-dimensional analysis. So if you want to look at sales data, for example, you can answer questions like: how many products of each type did I sell last quarter, or who are my most profitable customers. The key is that the questions aren't known in advance. Mondrian provides an infrastructure for users to be able to ask their own questions against the data without having to write code or SQL queries.

It differs from big data in that it is not a storage mechanism, but rather a way to access information about the data. It differs from machine learning in that it's a tool for analysts.
6 years ago
I'll answer in two parts. First, the scaling. Mondrian has two general approaches to scaling (chapter 7). The first is using aggregate tables. These are tables that pre-aggregate the data. For example, suppose you are storing facts about sales at the hourly level, but you usually just do analysis at the daily or weekly level. You can create an aggregate table that is used at those levels. This reduces the data being returned.

The second technique is caching. Mondrian caches schema, members, and segments (the things that make up an aggregate). This means that once the data has been queried it is stored in memory. Additionally, Mondrian support external caches, such as Infinispan, that allow very large amounts of data to be stored in memory with persistence and failover.
6 years ago
Mondrian has the ability to do what-if scenarios. However, since it's an engine, that has to be exposed via a tool. Analyzer (part of Pentaho EE) does not currently support this capability. Saiku, however, does. I'd go check them out http://dev.analytical-labs.com.
6 years ago
Thanks, everyone. I'll do my best to answer all of your questions along with Nick and Julian.
6 years ago