With JPA (prior to JPA 2) we can define object identity as ID Class, Embedded ID (they have their own pros and cons) and directly inside POJO (by ID annotation). Is there any addition/changes for this in JPA 2? And also I'd like to know what is the best strategy for this (there are many disscussions about this in the web, but not much details), mainly about having surrogate keys... Disadvantages on having keys with business meaning?
mainly about having surrogate keys... Disadvantages on having keys with business meaning?
From the database's point of view, surrogate keys are much prefered. The characteristics you need of a primary key are:
they are unique
they are not nullable
they are unchanging
Using a key with a business meaning , even if it is very carefully chosen, seed potential problems into your data model from the outset. Suppose you picked surname as a primary key for a Person table (daft I know, but bare with me). What do you do when someone with a surnane that is already there needs added? How do you handle someone who doesn't have a surname? What do you do when someone changes their name because they get married? And so on.
You might think that these problems could be avoided by more careful choice of key (like I said, surname was a little daft). But compare the effort in thinking through all the possible failings of a particular key with business meaning with just letting the database pick a meaningless number (at least, meaningless to us) to identify data.
Surrogate keys are a bit of a thorn in the side of the ORM ideal, if it really bridged the gap completely the person working with tthe object model would not need to know about the existance of any key at all, since this only has meaning in the database. But they are a very simple compromise to idenitfying data.
We didn't really add any new type of identifier in JPA 2.0 since an id class and embedded id pretty much covers what you could ever want to reasonably do, I think, but we did add a lot of options for more easily mapping these types of identifiers. See this thread for some examples. There are lots more, particularly when you get to mixing embedded ids with id classes, and multiple composite PKs. I spend a fair bit of time in the book going over the gory details of these and other cases.
In terms of the best strategy, well, that is mostly a matter of personal taste and application use. Paul mentioned some of the commonly held opinions about the synthetic or provider-generated PKs. He is right that they can indeed impose a simple uniqueness that is easy to manage and efficient to use. I thought I should add a few more perspectives, though, for your consideration.
It turns out that there is almost always one or more attributes that is/are unique in your data set. If two records could have the same data but a different generated key, from an application domain perspective they are the same thing and there would be no reason for there to be two records in the first place. So you have to be careful that you are not just covering up the uniqueness problem. You usually do need to know what is unique about the data in any case, and often additional database constraints ensuring that uniqueness will need to be in place.
One of the problems with generated primary keys is that from the client perspective, an artificial PK has no domain relevance. If you are looking for a particular domain object and you know the domain-specific unique aspect of it, you can't use the prototypical PK lookup operation (find) because you have no idea what its generated PK is. You know the practical domain key, but since you are using a generated PK you are forced to do a query instead of a simple cache lookup by PK. What you end up doing is putting indexes on the actual unique domain fields in the database, anyway, to make these types of queries more efficient. At that point you might just as well have used the application attribute as the key. If course, when the key is composed of multiple fields you get back to the complexity, management and efficiency arguments again...
So generated keys can really be helpful, but you should understand what you are giving up, and not use them if you don't need them.
I should also mention that although it is possible to take these generated keys out of the user view, when the entity is mobile, or just for client management and entity differentiation, having the unique id in the entity itself is a real advantage.
Again, as is commonly the case, the answer is "it depends on the application".