i have an application which maintains contacts. now contact management can be done through web which has a different database or can be done through a handheld device which again has a different database. now though semantically both DBs are same but they are different DB servers (and we cannot have one DB for both the interfaces for some reason). now i have to design an application which would sync both the DBs. How should i go about it. it obviously needs to merge, add, and even delete contacts. i mean in case from web you edit a contact but from the handheld you delete that same contact it has to handle such scenarious. can someone give any indicators... i thought that when the sync application is run i could create a snapshow of both the DBs and compare them and update both synchronously. but in this case what happens if the DB is changed while this sync job is running... also i am confused as to how i would 'compare' the DB snapshots....
Jeanne Boyarsky
,
author & internet detective
staff
Valid approaches are going to deal with constraints you have. For example, can the program read from both databases? How accurate does it need to be? How do you plan to deal with conflicts?
JTP and all is fine actually the question is how exactly to achieve it. how do design this sync process. as for the other question handhedl device will only read its own DB and web its own... and yes, they need to be consistent right... i mean otherwise it would not serve any purpose
There could be multiple approaches to handle this, but what we use is
1. JTA as already mentioned
2. XA datasources.
The theory behind this approach is that when JTA is managing the transactions, it will first check the databases if they are in a state of commit, for every transaction (good case) , this will ensure that if a particular resource (database) will not be able to commit the JTA will rollback the transaction, and hence the databases will be in sync even if one of the databases as in your case would have committed successfully.
Ravi Kiran Va wrote:when you are going for two databases synchronization issues ,i would suggest to go for JTA Suppourt to achive this .
This is fine if your application is the only one that updates the databases. If there are other mechanisms whereby the databases can be updated it is not enough.
Some databases provide tools to maintain a schema that spans two databases. Assuming both databases are the same type (e.g. both Oracle) you could do this, though if they are you have to wonder why they are two separate databases at all. What database are you using?
Your application can maintain the integrity of the databases using XA transactions, however all clients must behave the same for this to always be guaranteed. Its a mistake to trust the client to maintain the integrity of the database. If someone opens a database client and updates some data in one database, they are now out of sync. To properly do this you need something like a linking mechanism between the two, which is why it is so very hard to do (and why no one would choose to use two databases without very good reason).
Well, that's a different question from the one you asked. Some database (Oracle for example) provide auditing capabilities. If your database doesn't have such capabilities, you'll need to write your own using triggers, or some similar mechanism.
ok and once i get the changes that were done in the two DBs... how do i compare them.. how would i come to know which should be the latest changes and thus should overwrite the other changes done in the other DB..
have a strategy for how your application handles the latency you are building in to the application
have some sort of strategy to resolve collisions
Doing the first is fairly easy, you can use triggers to do it. Some databases come with this sort of auditing, if yous does not you'll need to write triggers for create, update and delete operations for every paired table that logs what was changed and how to a history table. Doing the second is easy too, just have these triggers log a time stamp of when the change was made. Your application, if it is not driven by these triggers, will have to will have to watch these audit tables to create/update/delete data in the other database (and vice versa) as appropriate.
Doing the last two points is really hard and depends on your application's business logic. You are going to have to work out what to do when you have an update on one entity that has been deleted from another database, or what to do when you have two concurrent updates on the same entity in both databases.
Every field? No, you can timestamp the row and describe the operation (e.g. 'u' for update, 'd' for delete etc. ) then your code can compare the current audit row with the previous to track changes for that entity in one database. You'll need extra logic to then compare against the value in the other database. Auditing database tables will cause a significant increase in the size of your databases. There's no avoiding that.
i do not think its going to work. think about it... lets say from the web interface for a particular contact the user changes the first name and after that from the handheld device the user changes the first name of this contact...now as per the timestamp i would be overriding the Data of the Web DB with that of my handheld DB since according to the timestamp handheld DB has the latest changes, but actually its incorrect since in this case i would loose the first name change that i did from the web interface.. right? thats why i was saying that we might need to have timestamp for every field and this dosent sound like a good logic... what do you think ?
A timestamp for every field doesn't solve that problem - think about it, what is different from that and a row for each change? What you have to come up with are the business rules to define what happens when conflicts occur - conflicts are going to occur and you are going to have to throw away data (or otherwise quarantine it till someone can manually make the decision of which update wins).
Like I (and others) said earlier this is very hard. Which is why it is rare to hear of people choosing to try to do it.
Jeanne Boyarsky
,
author & internet detective
staff
You are correct that you have to track changes at a field data and it is a lot of data. And a lot of work.
Which is why I asked above why the constraint on having two databases. If you can't talk about it here, talk about it with the person imposing the decision and make sure they are aware of the consequences.
There is an open source Java project that is able to describe the data or ddl differences between two tables in a structured text format. It also has a component that is able to generate SQL statements (inserts and updates) that will modify one of the tables so that it will have the same data elements as the other.
Why don't you see if that project is able to provide you with what you need.
www.diffkit.org
Joe
Post by:autobot
I promise I will be the best, most loyal friend ever! All for this tiny ad:
a bit of art, as a gift, the permaculture playing cards