• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • paul wheaton
  • Devaka Cooray
  • Jeanne Boyarsky
  • Tim Cooke
  • Liutauras Vilda
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Piet Souris
  • salvin francis
  • Mikalai Zaikin
  • Himai Minh

JPA DISTINCT Clarification

Ranch Hand
Posts: 226
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

Reading p198 of the Pro EJB book, it gives the following query as an example

and says

The DISTINCT operator is functionally equivalent to the SQL operator of the same name. Once the result set is collected, duplicate values (using entity identity if the query result type is an entity) are removed so that only unique results are returned.

If I am understanding this correctly, it strikes me as being inefficient? If I have 1 million rows with only 5 unique values, the JPA provider will pull back all 1 million of them, and then sift out the unique ones before handing me the 5? Instead, it could pass the DISTINCT value to the db in the SQL and let it do the work?

The only thing I can think of is when merging/joining in some way, unique values in the object model may appear that couldn't be found at the db level, or that Entities could overwrite/define the equals method differently in the Object model and this would affect JPQL DISTINCTness in a way that SQL couldn't help with?

Hope that makes sense and that someone can clarify this for me.


Ranch Hand
Posts: 62
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think this is more related to how the DB works... and the available indexes...

My guess is that if you created an index on that column in the database, the result would be returned very quickly...

Oh...and by the way... if you query the DB for unique values, wouldn't make sense that the DB returns 1 millions rows...
and put them in your object....in that case you would get only the 5 unique keys...otherwise your VM would explode ;)

I don't know if this answered your question...

    Bookmark Topic Watch Topic
  • New Topic