• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Paul Clapham
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Roland Mueller
  • Piet Souris
Bartenders:

SQL question

 
Ranch Hand
Posts: 327
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, i am trying to understand something about correlated sql sub-queries. the following is taked from some web tutorial. regarding the third example, they say that



since only the dept_id is returned by the sub query, there is no need for any subsequent table access. For these reasons, the third query would normally perform better than the first.



thats looks incorrect to me... the third example looks very correlated subquery to me, so it should perform slowlier than the first.

Is there something i do not underst here???


Example 1 - non-correlated sub query

SELECT dept.name FROM dept
WHERE dept.id NOT IN
( SELECT dept_id
FROM emp
WHERE dept_id IS NOT NULL)

Example 2- outer join

SELECT dept.name FROM dept,emp
WHERE emp.dept_id (+) = dept.id

Example 3 - correlated sub query

SELECT dept.name FROM dept
WHERE NOT EXISTS (SELECT dept_id
FROM emp
WHERE emp.dept_id = dept.id)

The second example is an outer join SQL statement. This in fact does more than just return the names of departments which have no employees assigned to them, it also returns the names of those departments that do have employees assigned to them. This is because an outer join returns both matching rows and the non-matching rows on one side of the join.

The first and the third SQL statements would produce exactly the same results, but the first would probably be slower than the third if the dept_id column in the emp table were indexed (depending on the sizes of the tables).

The first SQL statement can not use any indexes - the where clause of the sub query is just checking for NOT NULL rows - so a full table scan would be performed. Also the sub query would be executed once for each row in the dept table.

On the other hand, the sub query in the third example can use the index and since only the dept_id is returned by the sub query, there is no need for any subsequent table access. For these reasons, the third query would normally perform better than the first.
 
Ranch Hand
Posts: 214
IntelliJ IDE Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Database performance can be very counter-intuitive. Also the way something works in one RDBMS may not be the same in another.

It seems that the point of the example you are looking at is that non-correlated subqueries are not necessarily always faster than correlated subqueries, although it would seem they should since they usually only execute once per statement. I think the example you show is more of an exception than a general rule.

The difference in performance here is that the non-correlated subquery is doing table scans, which of course are generally expensive. (I don't understand the explanation that this subquery executes once for each row of the dept table. It seems it should only execute once.)

The correlated subquery, on the other hand, is a covered query, meaning that it only uses columns that are indexed, and therefore the table itself does not have to be read. All the information needed is in the index. Reading indexes is (usually) a lot faster than reading tables. So that's why the correlated subquery, in this instance, is supposedly faster.

I would still want to see proof that it is really faster on my live database. Other factors, such as the size of the tables might make more of a difference than how your query is structured. And again, what works on Oracle, for example, might not work the same way in MySql or DB2.
reply
    Bookmark Topic Watch Topic
  • New Topic