• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

[MongoDB in Action] Collections performance.

 
Askar Akhmerov
Greenhorn
Posts: 20
IntelliJ IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, I have a question on performance of collections.

Requirement:
Persist historical user data, on event. Event is encountered pretty often. For example every 30-60 seconds.
Historical data should be analyzed every once in a while, once a day or once a week.

Scenario1: dump all the data in a single collection called "history"

Scenario2: create separate collection for every user.

Question: which approach is better? in terms of calculatin average value for all users data? in terms of calculating averages for separate user?

The answer of separate user averages count lies on top of a surface and promotes 2nd scenario as most suitable solution. It gets even better once we note that Mongo would create clollections for you automatically, so you don't actually have to know if collection is out there when you want to perform insert or find operations. That would be my argument, however I'm not sure about map-reduce capabilities of mongo and how collections sructuring affects it. Also there is partitioning concern, once db instances become sharded, separate user collections basically mean that there will be extra requests among shards.
 
Kyle Banker
author
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Askar,

I'd recommend using a single collection as opposed to a collection per user. You'll get much better space utilization this way, and as you saw, collections can eventually be sharded if needed.

What is the nature of the event? We often recommend pre-aggregating the data using counters within the document. You can see some concreate examples of this technique in the following presentation:
http://speakerdeck.com/u/mongodb/p/mongodb-for-analytics-john-nunemaker-ordered-list

Kyle
 
Askar Akhmerov
Greenhorn
Posts: 20
IntelliJ IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Kyle,

thanks for reply. In terms of event nature, it will be a web page user interaction. I'll be keeping a state of page data in a collection. I've read through the presentation and as I understood suggestion is to use pre-agregated values for things like clicks on specific url, page hit, etc. However I'm planning to have all the snapshots as individual documents in collection. I think it makes sence to keep partial aggregations in a separate collection then. Which is getting pretty close to the concept of OLAP.

+ sharding data by date range would also help I guess. Not sure if it's possible though
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic