Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

plate numbers collection

 
David Spades
Ranch Hand
Posts: 348
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I'm using morphia to connect to mongoDB. I'm collecting daily mileage for cars. Right now, all daily mileage for all cars are stored in 1 collection with the following attribute:
plateNumber
date
mileage

we want to keep the mileages from all the way back in 1990 onwards. Right now, we're already maintaining around 4500+ cars (that's roughly 1.3 mil records a year). We're trying with one year worth of data, and the performance is already slagging really badly. I was thinking of splitting the storage into multiple collections based on the plate number. so each plate number will have its own collection named after the plate number. I need some ideas. Is there any other way to solve this?
thanks.
 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, a couple of million records shouldn't be a problem on any database, including MongoDB, but performance will depend on what you're doing and what your environment is like.

Sharding is one common approach to spreading the processing load across multiple server nodes:

MongoDB wrote:Sharding is a method for storing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
 
David Spades
Ranch Hand
Posts: 348
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
after reading docs about sharding, it seems rather complicated to set up (query router, config server, etc). what about my approach with multiple collections with identical data structure? thanks
 
Junilu Lacar
Bartender
Pie
Posts: 8860
81
Android Eclipse IDE IntelliJ IDE Java Linux Mac Scala Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It kind of depends on your data access/usage patterns, doesn't it? There are no joins in MongoDb so you're going to need to do a bit of programming if you want to calculate something that involves all those collections. I'm not familiar with Morphia (yet) so I don't know if it can help in that respect. Also, what happens if a vehicle gets a new plate? Do you start a new collection or keep adding to the existing one?
 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do you know that your performance problem is caused by the size of your data collection? A couple million records with just a few short fields (a few 100s of MB in total?) should not be a problem - I've run MongoDB on my home laptop with more data than that. What steps have you taken to isolate the bottle-neck? For example, just looking at the MongoDB side of things:

  • Is it performing slowly on queries?
  • If so, have you indexed columns that are used as query criteria?
  • Does the query perform slowly via the Mongo shell, or only in your application?
  • Are you only fetching the data you need?
  • For example, are you projecting only the fields you require, or are you fetching everything every time?
  • Are you using query criteria to limit the documents you fetch, or are you fetching everything and filtering them in your app?
  • Are you performing aggregations in the database e.g. using the MongoDB aggregation framework, or are you fetching all the data and aggregating in your app?
  • Have you looked at the rest of your environment - hardware, network bandwidth etc - to see if that could be a problem?
  • If it is performing slowly on updates, are you re-writing the entire record, or using $set to only modify specific fields?

  • There are several options for improving query performance in MongoDB: http://docs.mongodb.org/manual/administration/optimization/

    You could start by profiling your query: http://docs.mongodb.org/manual/tutorial/evaluate-operation-performance/

    This would at least help you find out where the performance bottle-neck might be.

    Having separate collections for each vehicle sounds like a really bad idea, as Junilu has pointed out. Logically, your vehicle mileage records belong together, because you will want to process them as a logical collection - how would you build a sensible query that wants to look at e.g. mileage in 2012 for cars registered in one state, if every car is in a separate collection? Right now it sounds like there is no reason why performance should be an issue on your current volumes of data, so I doubt whether this complicated approach would help anyway.

    If you really need to spread the processing load for a large collection, you can do so via sharding across multiple servers - that's one of the main ways MongoDB scales out for large volumes of data. But it doesn't sound to me like you have reached that point yet.
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic