Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

which is more efficient between two table setup?

 
Winston Liek
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi guys, I would like to ask which table structure is more effective in terms of performance for CRUD operation (especially retrieval). I need to retrieve description column:



Based from the code above, PERSON_DATA needs to join REF_COMPANY by COMPANY_CODE then based from the REF_COMPANY.TABLE_NAME, it will join either ABC_TABLE or XYZ_TABLE to get DESCRIPTION. I think it is more organized in this setup since separate table is created per company.

VS



Based from the code above, all company's jobs data are placed in a single table then a constraint is placed in COMPANY CODE.

WHen I execute one of a report, this should be the output:
Name | Company_Code | Age | Position_Code | Description
John Smith | ABC | 31 | JAVADEV | Java Programmer
Adolf Clinton | XYZ | 31 | JAVADEV | Java Developer Support
_____________________________________________

These are the following information:
1. PERSON_DATA and each company jobs(ABC and XYZ TABLE from first example) may contain hundred thousand data
2. There are many company tables(not just 2 from this example which is ABC and XYZ) and each table will also contain a hundred thousand to 500 thousand rows per table in the future.
3. Each company table (XYZ, ABC in this example) may contain common columns but it may not contain identical column (for example, XYZ may have PARENT column while ABC won't have)
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I wouldn't recommend this scheme. It would make querying the database very cumbersome and potentially increase the risk of SQL injection attacks (you would be creating SQL queries on the fly, stuffing table names in in the process - a recipe for an SQL injection). And some queries would be outright impossible to make efficiently - let's say that you would have, for example, a salary column in each of the company tables and would want to find out the maximum salary over all of the companies. With all data stored in one table, the query would be trivial (and if the column was indexed, it would also be instantaneous). However, in the scheme you propose, it wouldn't be possible to write an SQL query to return these data and you'd have to do so procedurally. (Well, you could maintain a view that would return an UNION of all company tables and query that view, but that would essentially be a crude form of partitioning I'll describe later - and an administrative nightmare.)

First of all, as some others have already told you several times, databases are fast and were build to process large amounts of data efficiently. It is possible to design a database in a particularly ineffective way, but when done correctly, even million of records do not pose a particular problem these days. And if you're worried about efficiency, the best you could do would be to create a database, fill it up with hundred thousands artificial records and see for yourself how it behaves under the conditions you envisage.

Also, if your individual tables will have hundreds of thousand records, you need to index them properly for quick data retrieval anyway. And if you manage to get quick access to data in one sub-table, properly indexed, you'll obtain practically the same speed of data retrieval over one global table with the same similar indexes (edit: the indexes wouldn't be the same, they would contain column identifying the company at the beginning). The only difference is the case when full scan of the table (ie. reading the contents of the entire table, without using any index) is required - here, the split up table could theoretically perform better (depending on the exact query), but a full-scan over several hundred thousand records is going to take some time, so if you want quick responses, you'll have to avoid full scans anyway.

Having different structure of individual company tables complicates the things further, in a big way, as the application(s) operating over such database will need information about all of these tables to avoid accessing column that aren't there.

Some databases have ways to split up large tables. This is called partitioning and you might try to look it up for the database you plan to use. In general, partitioning allows you to break large tables up into smaller chunks (partitions) by some criteria. Depending on your exact database, it might be possible to achieve splitting the table up to partitions each containing data about just one company. So, you would achieve exactly what you plan (except that all the partitions need to have the same columns, since they're part of the original, partitioned table), and the database would handle by itself all the nasty details of going to the correct partition for the piece of data being accessed. The application code doesn't even need to know whether the table is partitioned or not!

However, partitioning is most of the time used to make administering really, really large tables easier (and we're speaking about billions of rows here), and one of the problems of partitioning is to make sure that the operations over the partitioned table won't get slowed down. Only seldom is partitioning used to speed things up, and, frankly, one needs to read up something about the subject to be able to utilize it this way, especially as the details can differ between different databases.
 
Winston Liek
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm not familiar yet with partitioning, but between the two setup you're recommending the second option (the one with just one table for all company), right?

And in the future in case my data reached billions, I can partition it.
More or less I could only have less than columns which is not present in all company tables. In that case I'll just set null values. Is this correct?
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Liek wrote:I'm not familiar yet with partitioning, but between the two setup you're recommending the second option (the one with just one table for all company), right?

And in the future in case my data reached billions, I can partition it.

Yes, I recommend one table by all means. If it becomes necessary for some reason, you can partition it later (after careful analysis to asses possible impact), and you can even do so without having to rewrite the SQL queries in your application.

More or less I could only have less than columns which is not present in all company tables. In that case I'll just set null values. Is this correct?

Exactly. You'll declare all columns you might need and allow the optional ones to become NULL. Your application has to handle it gracefully, of course, but it would have to cope with different structure of tables in your original setup, so this isn't actually a complication.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic