Avoid removing duplicates with DISTINCT or UNION, when you have an accidental cartesian product
With heavy joining, one can loose track of all the relations that are playing a role in a SQL statement. Specifically, if multi-column foreign key relationships are involved, it is possible to forget to add the relevant predicates in JOIN .. ON clauses. This might result in duplicate records, but maybe only in exceptional cases. Some developers may then choose to use DISTINCT to remove those duplicates again. This is wrong in three ways:
It (may) solve the symptoms but not the problem. It may as well not solve the symptoms in edge-cases.
It is slow for large result sets with many columns. DISTINCT performs an ORDER BY operation to remove duplicates.
It is slow for large cartesian products, which will still load lots of data into memory
As a rule of thumb, when you get unwanted duplicates, always review your JOIN predicates. There's probably a subtle cartesian product in there somewhere.