This week's book giveaway is in the Agile forum.
We're giving away four copies of Head First Agile and have Andrew Stellman & Jennifer Greene on-line!
See this thread for details.
Win a copy of Head First Agile this week in the Agile forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Concurrency issue for dimensional tables  RSS feed

Jayesh A Lalwani
Posts: 2762
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not sure, if this is the correct place for this question

We have an OLAP database with a typical star shaped schema. It has several dimension tables and some fact tables. Dimension tables have values for the dimensions, and fact tables have FKs to the dimension tables. The values in the dimension tables have to be unique, otherwise the reports look messy

Now, we have this grid distributed application that does some massive computation and puts data into the OLAP tables. The input data gets split into batches, and the batches are sent to workers on the grid. The workers basically do some processing and then insert data into the fact tables. One of the challenges is the Dimension tables aren't prepopulated. So, if a worker sees an output that requires a new record in the dimension table, it has to insert into the dimension table itself.. AND the records in the dimension table should be unique.

Each worker basically does this (assuming for simplicity's sake that there is only one column in the fact table.. and only one dimension

When we insert into dimension table we do this query

When we insert into fact table, we do this

Each batch is executed in it's own transaction.

Again, this is a simple example. Actually, there are several dimension tables, and we insert into each dimension table if required, and do one insert into the fact table that joins all the dimension tables

Ok, so the problem is that when 2 differrent threads are concurrently executing batches that produce Output with the same value. Both threads insert a record into the DImension table. The changes are isolated because we haven't committed the transaction yet. Then both threads insert a record into fact table.. first record in fact table has the FK to first record in dimension table.. second record in fact table has a FK to the second record. Then both commit, which results in duplicate records in the dimension table. Now, a third batch starts up. This third batch doesn't insert any records in dimension table,. but inserts 2 records into fact table! Actually there are other constraints on the fact table that prevent us from have dup records, so third batch fails

So, my main problem is how do I do this concurrently without adding (too many) bottlenecks

  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!