Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

DB data integrity tool?

 
Carlos Conti
Ranch Hand
Posts: 131
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi there,

I have a db with over 170 tables. The db claims 8Gb of space (pictures, pdfs, etc...)

I find myself at the stage where I must guarantee and perform periodical checks of the data integrity for the data contained in this db. For that I have to test all foreign key relationships among the tables, besides running some logic rules to test that the data is congruent from a business perspective. In case any rule doesn't comply, must write a log in order to schedule data repairs.

I have two strategies:

1) Write the rules myself with plain SQL and Java and check them against the db.

2) Use any tool you guys suggest me to check! ;D

Please help. I guess a complete check would mean writting over 800 rules. So it is a bunch of work!... is there any nice db managers tool that eases that process??? It must! must only find it!

Please point me in the right direction.

Many many thanks for your help!

Regards,

Carlos.
 
Roel De Nijs
Sheriff
Posts: 10662
144
AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carlos Conti wrote:For that I have to test all foreign key relationships among the tables,

What happened with those good ol' foreign key constraints?
 
Carlos Conti
Ranch Hand
Posts: 131
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Roel,

this is a quite dirty database with not very restricted supervision.
At certain times we have seen ourselves in the situation where manual overrides had to be done. Moreover I am not fully sure all foreign keys are promptly configured. IN our app we use hibernate and so far everything works as expected. But I wouldn't bet much on the correctness of the whole thing.

That's why. But additionally we need to test some business rules too. Any ideas?
Many thanks,

Carlos.
 
Roel De Nijs
Sheriff
Posts: 10662
144
AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carlos Conti wrote:At certain times we have seen ourselves in the situation where manual overrides had to be done. Moreover I am not fully sure all foreign keys are promptly configured. IN our app we use hibernate and so far everything works as expected. But I wouldn't bet much on the correctness of the whole thing.

That's why.

So adding all referential key constraints will probably be a daunting task. But once done, you'll definitely benefit on the long term.

Carlos Conti wrote:But additionally we need to test some business rules too. Any ideas?

Shouldn't the business rules not be enfored by the application itself? And manual editing directly into the database is of course not allowed.

Based on the kind of business rules you need, you could use a check constraint. Instead of putting effort in fixing invalid data, I would make sure it's impossible to insert/update data which is violating referential constraints and/or business rules using all features of a relational database. Use referential and/or check constraints were needed. Other business rules should be implemented and enforced in your application. If the application is well implemented and tested, you don't need any manual edits of the database If for some reason the database still needs to be edited manually, you'll probably need triggers to enforce the data integrity.

I don't know any database data integrity tool. Because if your database is well-defined (with all possible constraints) you don't need such a tool.

Kind regards,
Roel
 
Tapas Chand
Ranch Hand
Posts: 583
8
BSD Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well said Roel...we have been taught that
Prevention is better than cure.
 
Carlos Conti
Ranch Hand
Posts: 131
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Roel,

Thanks for this interesting discussion. After your last post I have checked some related links and come to a possible solution. I have never used triggers, but they seem to be the key to my problem. It seems that inside a check, besides testing insert values for a given table inside a beforeupdate trigger, I am also able to check for constraints involving other tables.

That would enable me to program the constraints you were talking about right?

That would be a fantastic solution, for I would be able to include that logic inside the database, with all the benefits that involves regardind database migrations and maintenance.

The more logic for data integrity purposes I can integrate in the database the better for longterm scalability.

So when you said to make use of all check constraint possibilities within MySql I guess you were pointing in this direction right?

Thanks and regards.

Carlos.


 
Carlos Conti
Ranch Hand
Posts: 131
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Roel, and anyone this might interest. This link will be very useful in the context of this discussion:

http://cvuorinen.net/2013/05/validating-data-with-triggers-in-mysql/

Regards and if you don't mind Roel would be great if you could share your thoughts.


Carlos.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Carlos Conti wrote:Hi Roel, and anyone this might interest. This link will be very useful in the context of this discussion:

http://cvuorinen.net/2013/05/validating-data-with-triggers-in-mysql/

Regards and if you don't mind Roel would be great if you could share your thoughts.


Triggers are evil. Triggers are a hack. Triggers hide code. Trigger can be confusing to follow fire order. Just say no to Triggers!

While it will do the job, i strongly suggest you avoid triggers unless there is no other way to do it. Triggers are a problem waiting to happen.

If you database does not have RI (relational integrity) you should add it. If you can't because of special cases, your data model does not match your data requirements and should be redone. RI is the cornerstone of the RDBMS and the trustworthiness of your data, Much like java throws exceptions when something breaks the rules, the RDBMS throws exceptions when an FK is violated. This is the best possible way to secure your data. That is, at the data level.

Oracle (not sure about the others) allows you to leave bad data in place while restricting new bad data, via NOVALIDATE RELY. You may want to look into that option for the time being.
 
Roel De Nijs
Sheriff
Posts: 10662
144
AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
First of all, you should make your database schema as restrictive as possible. Add all kinds of possible constraints, this will give you data integrity out-of-the-box. So you should add primary keys, foreign keys, not null columns, unique keys, check constraints,... whenever and wherever it's needed. You are using a relational database, so you should make use of its features. Once all your constraints are in place, the database will enforce data integrity for you and for free! If you try to insert/update/delete data which violates one constraint, you will be punished and get a sql exception thrown in the face
I have seen plenty of relational databases having nothing but primary keys (and some NOT NULL columns). For me, that's a real :facepalm: moment and a definite don't! Adding all these constraints will take some time, but the ROI will be huge. So it's definitely worth it.

But of course, it's possible that you have some business rules which can't be enforced by these constraints. Then you have 2 options: either enforce it in the application (using a Validator class), or enfore it in the database (using a trigger). Both have their benefits and drawbacks. It's up to you to decide which one is the best fit for your needs. I can't make that decision for you.
Probably the main advantage of enforcing it in the application is: as a developer in a programming language (e.g. Java), you can write it in the programming language you are used to. So you don't need to write it in a language (PL/SQL) you are unfamiliar with. The (Java) code will also be easier to maintain and be easier to test than a trigger.
On the other hand (and that's what the article you posted is about), performing the validation in a trigger gives you the benefit that many applications can use the database and you have to write/maintain the business validation logic only once. Even if manual editing of the database is required (although I can't think of any reason to justify this), the integrity of your data will be enforced (unless someone temporarily disables the trigger).

Although I have extensively used triggers, I didn't have used them to enforce data integrity. I have always enforced the business rules in the (Java) application itself. Because this approach gives me all benefits I described in the previous paragraph. And the database is only used by one application and no one should every try updating the database manually.
If many applications could/should access/change this database, you could think of a service-oriented architecture: you have one service which is reponsible for all CRUD operations on the database. Using this approach you have only one application accessing the database (this service), so you can enforce business rules using validator classes and you only have to write/maintain them once. All other applications will use this service for reading and writing to the database.

Hope it helps!
Kind regards,
Roel
 
Roel De Nijs
Sheriff
Posts: 10662
144
AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Brian Tkatch wrote:While it will do the job, i strongly suggest you avoid triggers unless there is no other way to do it. Triggers are a problem waiting to happen.

I agree! Although we are extensively using triggers for more than 3 years and we don't experience any problems at all, we will only use a trigger if there's no other way to get it done. We use triggers only as a last resort!
 
Carlos Conti
Ranch Hand
Posts: 131
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well these are too many warnings against triggers and there's definetely a drawback.... testing them is not straightforward.... whereas validation is.... will have some tests runni g tomorrow with triggers and will see....

Many thanks to all for the richful discussion on the subject. Has been very revealing.

Regards,

Carlos.
 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just to add to the chorus of good advice above:

  • Don't use triggers.
  • Don't use triggers for foreign key checks.
  • Do use foreign key constraints to enforce foreign keys - that's what they are for.

  • Your database may allow you to disable/enable FK constraints on your tables and log FK errors automatically (Oracle allows this) - you'll need to investigate this or talk to your DBA. Once you've added the FK constraints to your database, you need never worry about missing FKs again.


     
    Martin Vajsar
    Sheriff
    Posts: 3752
    62
    Chrome Netbeans IDE Oracle
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Roel De Nijs wrote:But of course, it's possible that you have some business rules which can't be enforced by these constraints. Then you have 2 options: either enforce it in the application (using a Validator class), or enfore it in the database (using a trigger). Both have their benefits and drawbacks. It's up to you to decide which one is the best fit for your needs. I can't make that decision for you.

    There's a third possibility, which can be handy in some circumstances: implement the business logic itself in a stored procedure. It can be especially appealing if the database is being accessed by more than one independent applications, or if there's a skilled DB guy who can take care of implementing and optimizing the queries in it, or if the business logic does a lot of data manipulation in the database (in which case doing it in the DB can be faster).

    Triggers have similar benefits, as has already been mentioned, but doing data validation correctly using triggers is tremendously difficult in multi-user environment. Code in the triggers runs as part of a transaction, and therefore doesn't see changes made by other transactions, so two transactions running together might do changes in the database which are valid in isolation, but invalid when applied together. Possibility of different isolation levels complicates things even further. Don't do validation in triggers unless you know your database really, really well.
     
    Martin Vajsar
    Sheriff
    Posts: 3752
    62
    Chrome Netbeans IDE Oracle
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Brian Tkatch wrote:Oracle (not sure about the others) allows you to leave bad data in place while restricting new bad data, via NOVALIDATE RELY. You may want to look into that option for the time being.

    NOVALIDATE RELY isn't the way to go if there are data violating the constraints. In this case, you'd use NOVALIDATE NORELY (or simple NOVALIDATE, as NORELY is the default, for a good reason).

    NOVALIDATE RELY tells Oracle to skip constraint validation, but assume that all data are valid (just because you said so - see how gullible it is? ). In this case, the query optimizer will make assumptions about the data based on these constraints and might produce query plans that are incorrect when the data in fact violate the constraint. Wrong plan means wrong data being output by the database, and this kind of error is really difficult to find. (The reason to use this option is to save time the database would otherwise spent validating the constraint, if you know the constraint actually holds. It's used mostly in datawarehouses, where you've just built your billion-rows-table, you know it's correct, and you don't want to spend several hours validating the constraint.)

    NOVALIDATE NORELY is used in situations like this: we know there are invalid data in the table, and we want to prevent new invalid data from being stored in the database ()as you said). The database knows that there might be data violating the constraint, and produces query plans accordingly.

    This is Oracle, though. Not sure whether other database offers similar level of sophistication (read: amount of things that can go wrong).
     
    Brian Tkatch
    Bartender
    Posts: 567
    25
    Linux Notepad Oracle
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Martin Vajsar wrote:
    Roel De Nijs wrote:But of course, it's possible that you have some business rules which can't be enforced by these constraints. Then you have 2 options: either enforce it in the application (using a Validator class), or enfore it in the database (using a trigger). Both have their benefits and drawbacks. It's up to you to decide which one is the best fit for your needs. I can't make that decision for you.

    There's a third possibility, which can be handy in some circumstances: implement the business logic itself in a stored procedure. It can be especially appealing if the database is being accessed by more than one independent applications, or if there's a skilled DB guy who can take care of implementing and optimizing the queries in it, or if the business logic does a lot of data manipulation in the database (in which case doing it in the DB can be faster).

    Triggers have similar benefits, as has already been mentioned, but doing data validation correctly using triggers is tremendously difficult in multi-user environment. Code in the triggers runs as part of a transaction, and therefore doesn't see changes made by other transactions, so two transactions running together might do changes in the database which are valid in isolation, but invalid when applied together. Possibility of different isolation levels complicates things even further. Don't do validation in triggers unless you know your database really, really well.


    There's a fourth, fifth, and sixth too.

    If the business rule is within the table itself, within that one record, use a CONSTRAINT. If it is multiple records in the same table, an aggregate column with fancy logic can be added, or a function based index.
    If the business rule is amongst multiple tables, a materialized view (oracle) or indexed view (sql server) can be used, with constraints.
    If the business rule is the creation a a virtual object (as opposed to a logical object) use a VIEW. That's what they are for.

    Stored procedure are better to employ a set of rules to processing. They do not enforce rules, they implement them.
     
    Brian Tkatch
    Bartender
    Posts: 567
    25
    Linux Notepad Oracle
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Martin Vajsar wrote:
    Brian Tkatch wrote:Oracle (not sure about the others) allows you to leave bad data in place while restricting new bad data, via NOVALIDATE RELY. You may want to look into that option for the time being.

    NOVALIDATE RELY isn't the way to go if there are data violating the constraints. In this case, you'd use NOVALIDATE NORELY (or simple NOVALIDATE, as NORELY is the default, for a good reason).

    NOVALIDATE RELY tells Oracle to skip constraint validation, but assume that all data are valid (just because you said so - see how gullible it is? ). In this case, the query optimizer will make assumptions about the data based on these constraints and might produce query plans that are incorrect when the data in fact violate the constraint. Wrong plan means wrong data being output by the database, and this kind of error is really difficult to find. (The reason to use this option is to save time the database would otherwise spent validating the constraint, if you know the constraint actually holds. It's used mostly in datawarehouses, where you've just built your billion-rows-table, you know it's correct, and you don't want to spend several hours validating the constraint.)

    NOVALIDATE NORELY is used in situations like this: we know there are invalid data in the table, and we want to prevent new invalid data from being stored in the database ()as you said). The database knows that there might be data violating the constraint, and produces query plans accordingly.

    This is Oracle, though. Not sure whether other database offers similar level of sophistication (read: amount of things that can go wrong).


    Thank you for the correction! I meant ENABLE NOVALIDATE. That is, enforce the rule only moving forward. (FWIW, i remember those two together, and got mixed up.)
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic