Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
    Bookmark Topic Watch Topic
 
friki data migration
Ranch Hand
Posts: 772
1
  • Mark post as helpful
  • send pies
  • Report post to moderator
Avoid removing duplicates with DISTINCT or UNION, when you have an accidental cartesian product

With heavy joining, one can loose track of all the relations that are playing a role in a SQL statement. Specifically, if multi-column foreign key relationships are involved, it is possible to forget to add the relevant predicates in JOIN .. ON clauses. This might result in duplicate records, but maybe only in exceptional cases. Some developers may then choose to use DISTINCT to remove those duplicates again. This is wrong in three ways:


  • It (may) solve the symptoms but not the problem. It may as well not solve the symptoms in edge-cases.
  • It is slow for large result sets with many columns. DISTINCT performs an ORDER BY operation to remove duplicates.
  • It is slow for large cartesian products, which will still load lots of data into memory

  • The Cure:

    As a rule of thumb, when you get unwanted duplicates, always review your JOIN predicates. There's probably a subtle cartesian product in there somewhere.

    Source


    SqlBestPractices
     
      Bookmark Topic Watch Topic
    • New Topic