I have a schema inside my database with big tables (the schema is composed by 20 tables and each of them contains 500 million rows equivalent about 40GB ). the tables are quite simple, they contain 3 fields: id,value,datetime.
The queries (normally are based on the search in a certain time range one or more parameters values) now are very slow and some improvement is needed, so for the structure of the tables I was thinking about horizontal partitioning by range using the datetime field as element or split the schema in different pieces each of them including data for a certain period of time.
The second solution will create some problems in the software using the database because according to the range time selected by the user, the query must be addressed to a specific DB, containing that period, so if the performances are similar I will prefer the fist one.
What do you think about the performances of the two solutions, is there any other (third) applicable solution?
Usually you'd partition based on the ID, but if you search more often on time, perhaps that might work. But before you do that, have you tried indexing on the datetime field?
Another reason for partitioning large data based on its timestamp would be if you were doing the majority of your queries over a given time range. In that case, partitioning might make it easier to roll the data offline, or for DBMS's that support it, archival mode (where the old stuff might be run out to tape and restored on-demand).
Certainly it's worth considering whether or not to manually partition by storing data for a separate month or year in its own table, too.
An IDE is no substitute for an Intelligent Developer.
All of the world's problems can be solved in a garden - Geoff Lawton. Tiny ad:
RavenDB is an Open Source NoSQL Database that’s fully transactional (ACID) across your database