Reading p198 of the Pro EJB book, it gives the following query as an example
The DISTINCT operator is functionally equivalent to the SQL operator of the same name. Once the result set is collected, duplicate values (using entity identity if the query result type is an entity) are removed so that only unique results are returned.
If I am understanding this correctly, it strikes me as being inefficient? If I have 1 million rows with only 5 unique values, the JPA provider will pull back all 1 million of them, and then sift out the unique ones before handing me the 5? Instead, it could pass the DISTINCT value to the db in the SQL and let it do the work?
The only thing I can think of is when merging/joining in some way, unique values in the object model may appear that couldn't be found at the db level, or that Entities could overwrite/define the equals method differently in the Object model and this would affect JPQL DISTINCTness in a way that SQL couldn't help with?
Hope that makes sense and that someone can clarify this for me.
I think this is more related to how the DB works... and the available indexes...
My guess is that if you created an index on that column in the database, the result would be returned very quickly...
Oh...and by the way... if you query the DB for unique values, wouldn't make sense that the DB returns 1 millions rows...
and put them in your object....in that case you would get only the 5 unique keys...otherwise your VM would explode ;)