• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

The 1 billion challenge

 
Mahtab Alam
Ranch Hand
Posts: 391
1
Java MySQL Database PHP
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to test the performance of three different databases mysql, oracle, mongodb

I am thinking of inserting 1 billion records to a table with two three columns.

After inserting rows to table I want to create an index on one of the column of the table on which my sql query will be based.

I want to fetch a row from the table.

I want to see how much time each database takes.

 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you're just doing this for fun, then go ahead. But it won't really tell you anything useful, because simply loading 1 billion short rows into an empty table isn't a realistic application, and you are not comparing like-with-like.

For example, an RDBMS like Oracle or MySQL typically has foreign key constraints between tables, which ensure that when you insert a child record, the parent must exist first. But checking the FK means that inserts take longer, so it is common to disable the FK constraint while loading large volumes of data, then re-enable it afterwards, logging and dealing with any FK errors as you do so. If you want to test a process of loading 1 billion records with FKs, then you need to allow for checking the FK constraints. If you just load 1 billion records without FKs, you are not really storing relational data in your RDBMS anyway.

MongoDB has no FKs, and "child" records may be stored with the parent record or they may have a reference to the parent record in a separate collection. MongoDB doesn't have to check FKs, so it can write records very fast, because it's not doing the same amount of work as an RDBMS. On the other hand, unlike an RDBMS, in MongoDB you have no guarantee that any references to other documents/collections are valid, so you are not really comparing similar systems: MongoDB allows you to store potentially invalid data that Oracle/MySQL would reject.

RDBMS and MongoDB all allow different kinds of partitioning/sharding/replication of data, but the mechanisms are all very different. If your application really needs to manage 1 billion records in a single table, you would probably need to use these tools as well. But which ones, and how? Again, you need to have a better idea of what features and capabilities you really want to compare between these databases.

Creating your (unique?) index would also involve these mechanisms e.g. indexes can be partitioned (on an RDBMS) or sharded on MongoDB, so similar questions apply here. How do you want to manage large indexes, and how do you compare these techniques between the different databases? And how do you plan to deal with errors due to duplicate keys in a unique index?

Transactions on an RDBMS can be rolled back, which means the RDBMS has to use extra resources to keep track of what data has changed, in case you want to reverse those changes. MongoDB doesn't do this. So which behaviour do you need, and how do you design your test to compare the databases accordingly?

Also, will you have exactly the same platform (OS, RAM, CPUs, disks, network latency etc etc) to ensure a fair comparison?

If you have a clear application in mind, you should think about what your application needs to do, what the data will be like, and what features of these different databases are most important to you. Then design a test to explore those features.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic