This week's book giveaway is in the Agile and Other Processes forum.
We're giving away four copies of The Little Book of Impediments (e-book only) and have Tom Perry on-line!
See this thread for details.
Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Avoiding Duplicate reads in a clustered application.

 
Sri Harsha Yenuganti
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi ,

We are using a TimerEJB that periodically gets a record from DB2. But the cluster will have many such EJB instances running accessing the same DB getting records. Now our problem is to find a way to eliminate duplicate reads from the DB by these EJB instances. We cant make the TimerEJB cluster-wide singleton because the incoming request volume is very huge, 250,000/day.

Can any one suggest the best way to eliminate duplicate reads in this situation? Does Hibernate/JPA provide an easier way to do this?

Thank You,
Sri.
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No.

You probably need your distinct timers to identify themselves with the records they are processing, e.g. have some way of seeding the timer instance and add a field to the processed entities so you can know a particular timer is processing it and other timers should leave it alone.

The logic might be something like:
- a timer updates n rows from your entity where timer_id is null
- select * rows where timer_id = my_timer id
- do whatever you need to do

The hard things in there are updating n rows, which will need to be database specific or a stored procedure even. You'll also need some per timer way of supplying a unique ID - perhaps on start up get the next id from a sequence? You may need to make these IDs hard coded per timer - otherwise you run the risk of no knowing which timer_ids represent an active timer and you end up with unprocessable records.
 
Sri Harsha Yenuganti
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

We need to process the records first come first serve basis. Or at least the records should be processed by the end of the day they are recieved. So random seed generation may not help us.

I am thinking of these options. Can you tell me will these things work? If yes, which would be better one.

1. Using Hibernate and having a distributed cache. Maintain a list of records accessed in cache. Whenever the EJB reads a record, the hibernate goes through the distributed cache and if it finds a match, return null/error.

The problem is that the list should be dumped at some point of time not to make it too big. If we dump the list, we are back to the beginning of the problem.

2. Have a sequence ID on the first table.
Use a seperate simple table that holds the sequence number of the latest record accessed. Have a pessimistic lock on this simple table. EJB comes to this table to get the sequence number of the record to be fetched. But having a second table and pessimistic lock on it may be a performance hit when the incoming orders are 250,000/day.

3. The sequence ID can be placed in a distributed cache if we have one. But we cant configure a new distributed cache just for the purpose of holding this single key if it is not already in place.

4. Place a sequence number in a JMS queue. EJB reads the queue, gets the new sequence number and goes to fetch the record. Mean while it increments the sequence number and places it back into the queue for another EJB to pick up.





 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

We need to process the records first come first serve basis. Or at least the records should be processed by the end of the day they are recieved. So random seed generation may not help us.

Not sure why this precludes using a seed (this is after all how some JMS implementations work); maybe I'm missing something, can you explain?


1. Using Hibernate and having a distributed cache. Maintain a list of records accessed in cache. Whenever the EJB reads a record, the hibernate goes through the distributed cache and if it finds a match, return null/error.

A distributed cache makes sense, insofar as you are forcing all data updates through the cache so your application can control them. It doesn't work if they are updated by something else. Caches can be run on to disk, so you presumably don't have to worry too much about that? Maybe clear the cache daily?


2. Have a sequence ID on the first table.
Use a seperate simple table that holds the sequence number of the latest record accessed. Have a pessimistic lock on this simple table. EJB comes to this table to get the sequence number of the record to be fetched. But having a second table and pessimistic lock on it may be a performance hit when the incoming orders are 250,000/day.

That works. The pessimistic lock might be an issue, but not that big an issue I'd guess. The lock would be there and gone in a very brief time. Might be worth prototyping and trying some performance metrics?


3. The sequence ID can be placed in a distributed cache if we have one. But we cant configure a new distributed cache just for the purpose of holding this single key if it is not already in place.


No sure I see how this works. Won't the cache be being updated so frequently as to not really be a cache?

 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic