• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Jeanne Boyarsky
  • Tim Cooke
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Tim Moores
  • Mikalai Zaikin
  • Carey Brown
Bartenders:

Issue with concurrency and transaction handling in JEE

 
best scout
Posts: 1294
Scala IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello rangers,

I'm stuck with a concurrency issue with JMS/EJB/JPA and hopefully someone around here can give me some advices for a proper solution.

The problem seemed to be not too difficult or strange but I just can't get it to work as expected with a clean solution. The application in question is a JEE application which receives low-level SMTP messages via a JMS queue and processes and prepares these data for later usage. Each JMS messages is transformed into an email object which itself consists of metadata and the content of the mail. The meta data in turn contain email addresses for the envelope sender, envelope recipients, senders and recipients of the email. An EmailAddress object representing each address consists of the email address and a real name.

What I'm trying to do now is to create all these data structures from the information I get from the SMTP message. Basically that's no big deal and everythin works like expected. What I can't get to work is how email addresses should be handled. Firstly all email addresses should be shared, reused and stored only once in the database, no matter if it's a sender or recipient address and to which mails it may belong. My idea is to do the following for each email address:

- look if an email object for the given email address already exists in the database
- fetch and update this object with the real name if it doesn't have one yet and there are newer information providing a real name

- if no database entry exists for the given email address just create a new one and use this

The application logic seems to be correct because everything works fine for sequential input data. But as soon as I start to flood the queue with more test data concurrent processing kicks in and concurreny issues start to happen. Specifically I get lots of exceptions because of duplicate database entries for email addresses. I must admit that I'm neither a database nor an JEE expert but obviously the transaction handling alone doesn't help to implement the desired "create or update" pattern.

I already played around with different transaction attribute types and isolation levels. I tried with pessimistic locks and @Singleton for the DAO class (for email addresses), too. None of these options seem to solve the problem "magically" :-) Anyway, after experimenting with this (seemingly easy problem) for quite some time now I'm not even sure where to search for the root cause of the problem. Is it a concurrency problem in the code? Can it be solved with proper transaction handling? Do I have to mess around with JDBC isolation levels?

I'd be very thankful if anyone could help me to find a clean solution which really works under high load, too. Maybe it would be just fine for this application to throttle the message consumption in order to avoid concurrency issues but this just doesn't seem to be a real solution ;-) As this does not seem to be a rare use case in my opinion I'm sure there a best practices or patterns to use for such a scenario


Thanks a lot!

Marco
 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Has anyone else experienced this or come up with a solution? I'm experiencing the same in Websphere 7, OpenJPA.
 
Ranch Hand
Posts: 30
Spring Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just a quick question, How is supposed to know if an email exist or not? I mean, what's your id or key (in your java object) and what's your database id? Do you have sequences? auto increments? That might be a clue.

 
Marco Ehrentreich
best scout
Posts: 1294
Scala IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The EmailAddress entities have an id field of type "Long" as a surrogate key for JPA. The natural key to determine if an email address already exists is simply the normalized email address itself as a String/VARCHAR.

Although I haven't worked on this issue any more I'm still surprised that there were no answers to my post. I guess the problem is either too easy to even think about it or I was doing something completely wrong. Maybe the simplest solution is to create a database index on the email address and just try to persist any new address. This should raise an exception if an address already exists and the email address can then be read from the database and reused in this case.

Anyway I'd still like to hear some best practices!

Marco
 
Sam Nunnally
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Q: How is supposed to know if an email exist or not?

A: In my case, an attempt to fetch the entity by it's key. If the entity is null, then it does not exist.

The issue i'm having posted here JMS-JPA-concurrency-Websphere-App is very similar. There seems to be a race condition of one JMS message process inserting an entity inbetween the time that another JMS process check for the entity (and it does not exist) and attempts to insert the entity (and it is already there).

The closest thing i can compare the process to is when a singleton object is created in the private/protected constructor, the first time through the getInstance method a check for null is performed to determine if the instance is created. Normally some type of semaphore would be referenced as synchronized to block any other threads from entering the block while the first thread in is performing the check for null and instance creation.
 
Ranch Hand
Posts: 179
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Marco Ehrentreich wrote:The EmailAddress entities have an id field of type "Long" as a surrogate key for JPA. The natural key to determine if an email address already exists is simply the normalized email address itself as a String/VARCHAR.

Although I haven't worked on this issue any more I'm still surprised that there were no answers to my post. I guess the problem is either too easy to even think about it or I was doing something completely wrong. Maybe the simplest solution is to create a database index on the email address and just try to persist any new address. This should raise an exception if an address already exists and the email address can then be read from the database and reused in this case.

Anyway I'd still like to hear some best practices!

Marco



Since fetching by emailAddress , is causing problem as duplicate entries might exist, I would suggest you fetch a list() instead of uniqueResult(). Then iterate over the list to implement your logic
 
Bartender
Posts: 1682
7
Android Mac OS X IntelliJ IDE Spring Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Unfortunately there is not a really happy answer to this.

1) Catch the constraint violation and do a find and update instead of an insert (while using this approach by itself be aware that the last update wins)

2) Implement a queue with single threaded consumer for processing the e-mails.

 
Sam Nunnally
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That was my initial idea as well, to catch the EntityExistsException and then simply requery to get the entity. The exception is being thrown in the Container Managed Transaction when the transaction is committed (after the actual call to insert) so the code calling the updateDevice or createDevice never gets the exception
 
Rancher
Posts: 2759
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am facing a very similar problem. However, we aren't doing it using JPA. We are using straight JDBC. We have this data in our OLTP tables that needs to be put in OLAP tables. The OLAP tables have dimensional tables, and the dimensional tables should have unique records

We are basically running a query like this




The problem is when 2 transactions try to run the same query concurrently, there are dupes in OLAP DIM table. The inner select query doesn't return any matching records in both transactions, and both transactions insert records

THe solutions that I am playing around with is

a) Use Serializable Isolation level
If you use serializable isolation level, the transaction that first gets to the query will succeed. Any transaction that tries the insert while the first transaction is still active will fail. You will have to catch the exception in code and retry
b) Lock the table
We are using Oracle, and you can put an exclusive lock on the table. The transaction that gets to the lock first will succeed. Subsequent transactions will wait until the first transaction commits/rolls back.

We are doing a lot of concurrent background processing, and we don't want our processing to fail randomly or continuously run the same queries over and over again on the database. we would rather have the background processes "synchronize" with each other. The second solution seems to be cleanest for us.

If I were building an UI app, I would rather ask the user to try again rather than make him/her wait for a long time. Using Serializable Isolation level might be better for a foreground app.
 
Bill Gorder
Bartender
Posts: 1682
7
Android Mac OS X IntelliJ IDE Spring Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

a) Use Serializable Isolation level
If you use serializable isolation level, the transaction that first gets to the query will succeed. Any transaction that tries the insert while the first transaction is still active will fail. You will have to catch the exception in code and retry



I was aware of this but it has potential performance ramifications, so I did not mention it. It is one of those ask your DBA things I think.


b) Lock the table
We are using Oracle, and you can put an exclusive lock on the table. The transaction that gets to the lock first will succeed. Subsequent transactions will wait until the first transaction commits/rolls back.



This too can cause bottle necks. I prefer the work queue with a single threaded consumer to this approach.

The problem is when 2 transactions try to run the same query concurrently, there are dupes in OLAP DIM table. The inner select query doesn't return any matching records in both transactions, and both transactions insert records



Sounds like you need to set your constraints to prevent duplicates. While it may throw an exception catching and handling this is far better than duplicates in the database (assuming duplicates are not acceptable)

With JPA synchronization is a bit trickier because the same entity manger should not be accessed by multiple threads (they are not thread safe)
 
Jayesh A Lalwani
Rancher
Posts: 2759
32
Eclipse IDE Spring Tomcat Server
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Bill Gorder wrote:

a) Use Serializable Isolation level
If you use serializable isolation level, the transaction that first gets to the query will succeed. Any transaction that tries the insert while the first transaction is still active will fail. You will have to catch the exception in code and retry



I was aware of this but it has potential performance ramifications, so I did not mention it. It is one of those ask your DBA things I think.


b) Lock the table
We are using Oracle, and you can put an exclusive lock on the table. The transaction that gets to the lock first will succeed. Subsequent transactions will wait until the first transaction commits/rolls back.



This too can cause bottle necks. I prefer the work queue with a single threaded consumer to this approach.

The problem is when 2 transactions try to run the same query concurrently, there are dupes in OLAP DIM table. The inner select query doesn't return any matching records in both transactions, and both transactions insert records



Sounds like you need to set your constraints to prevent duplicates. While it may throw an exception catching and handling this is far better than duplicates in the database (assuming duplicates are not acceptable)

With JPA synchronization is a bit trickier because the same entity manger should not be accessed by multiple threads (they are not thread safe)



Unfortunately, there are no "good" solutions here. We have tried the "single threaded consumer" approach, and that is not a perfect solution either. It works, but it's unnecessarily complex

1) You are introducing a bottleneck at the thread instead of database. You still have a bottleneck, it's just at a different place.
2) if you are doing processing on a grid, you are adding the overhead of getting the data to your consumer. One of the things that you want to look for in a system that is highly parallelized is that you want to limit the amount of data shuttling between your nodes. Single threaded consumer will work for an application that is always deployed on a single node. Unfortunately, you cannot make that assumption when you are planning for high loads. Even if you are building something trivial like allowing users to enter Email addresses, and you build a single threaded consumer running inside your web server to solve the concurrency issues. What's going to happen if they deploy the app on a farm of web servers? Are you going to build a consumer node whose only job is to persist the email addresses? What happens when the load on your consumer node becomes too high, and now you need 2 consumer nodes? Now, you need a solution that has some intelligence when it distributes email addresses to the consumer nodes. Doesn't this sound a little complicated solution for something that the database can already do?
3) A single threaded consumer essentially puts a lock on the entire process that puts the data in the database. This might work if the process is trivial, but if you have a process that persists data in multiple tables, you want you lock to be at lower granularity. You can "solve" this problem by essentially having differrent kinds of consumers, and some sort of master consumer that feeds the other consumers. Yes solvable, but unnecessarily complex.
 
Sam Nunnally
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Doing some testing i have found a decent work around though I'm still not 100% satisfied. Any other suggestions will be welcome:

- Set the JMS MDB in a Bean Managed transaction type and keep the other subsequent SSBs in Container mode so that I can still have the actual data operations Container managed for rollback.
- Set a programmatic threshold in the MDB to retry and process the message X amount of times.
- Catch the EJBTransactionRolledbackException (caused by the EntityExistsException) when the SSB is invoked
- Retry to process the message again using the threshold (in my case the second time was successful because the previous transaction was committed and the retry found that the entity exists and sucessfully processed the message)
- I set my threshold programatically to try and process the message twice before sending it to an error queue.

I don't normally like to use exception catching as a control flow. I also imagine there is a way to keep the JMS MDB as Container Managed transaction and have the J2EE server re-process the message...
 
Bill Gorder
Bartender
Posts: 1682
7
Android Mac OS X IntelliJ IDE Spring Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Unfortunately, there are no "good" solutions here.



Agree I said the same in my first post

I will agree the approach to be taken is very situational. If the the system is highly parallelized and a single threaded consumer would not be able to keep up, you need to look at other options (it is just one way to solve the problem and works for some cases and maybe not for yours). I will say that locking an entire table seems to me that it will have a similar bottle neck as the single threaded consumer since other threads wont be able to gain a lock on the table, and will have to wait. It will also (once again depends on situational things) potentially block other systems or processes from properly using that table causing problems there as well.

The simplest and most commonly used solution to this problem I have seen to date is my point #1. Apply constraints to the table to prevent duplicates and handle the exceptions. It is not pretty but it has the following advantages:

1. Does not have the bottleneck on the single threaded consumer
2. Will not have a bottle neck on a locked table.
3. Does not lock tables that other applications or processes may be using (or need to use in the future)
4. Database manages the integrity of the data through constraints
5. While not pretty handling the exceptions and retrying is probably in many cases more performant than the alternatives.
6. It is fairly simple to implement.

The other option would also be the setting the isolation level to serializeable, I guess that is situational as well. I am no DBA but I have been shot down on this before by DBA's for performance reasons.

I think every application has its own requirements and everything is very situational. It is agreed there is no great solution it is just going to be finding the one that makes the most sense taking into account all the factors.
 
permaculture is a more symbiotic relationship with nature so I can be even lazier. Read tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic