Hi, i am trying to understand something about correlated sql sub-queries. the following is taked from some web tutorial. regarding the third example, they say that
since only the dept_id is returned by the sub query, there is no need for any subsequent table access. For these reasons, the third query would normally perform better than the first.
thats looks incorrect to me... the third example looks very correlated subquery to me, so it should perform slowlier than the first.
Is there something i do not underst here???
Example 1 - non-correlated sub query
SELECT dept.name FROM dept WHERE dept.id NOT IN ( SELECT dept_id FROM emp WHERE dept_id IS NOT NULL)
Example 2- outer join
SELECT dept.name FROM dept,emp WHERE emp.dept_id (+) = dept.id
Example 3 - correlated sub query
SELECT dept.name FROM dept WHERE NOT EXISTS (SELECT dept_id FROM emp WHERE emp.dept_id = dept.id)
The second example is an outer join SQL statement. This in fact does more than just return the names of departments which have no employees assigned to them, it also returns the names of those departments that do have employees assigned to them. This is because an outer join returns both matching rows and the non-matching rows on one side of the join.
The first and the third SQL statements would produce exactly the same results, but the first would probably be slower than the third if the dept_id column in the emp table were indexed (depending on the sizes of the tables).
The first SQL statement can not use any indexes - the where clause of the sub query is just checking for NOT NULL rows - so a full table scan would be performed. Also the sub query would be executed once for each row in the dept table.
On the other hand, the sub query in the third example can use the index and since only the dept_id is returned by the sub query, there is no need for any subsequent table access. For these reasons, the third query would normally perform better than the first.
Database performance can be very counter-intuitive. Also the way something works in one RDBMS may not be the same in another.
It seems that the point of the example you are looking at is that non-correlated subqueries are not necessarily always faster than correlated subqueries, although it would seem they should since they usually only execute once per statement. I think the example you show is more of an exception than a general rule.
The difference in performance here is that the non-correlated subquery is doing table scans, which of course are generally expensive. (I don't understand the explanation that this subquery executes once for each row of the dept table. It seems it should only execute once.)
The correlated subquery, on the other hand, is a covered query, meaning that it only uses columns that are indexed, and therefore the table itself does not have to be read. All the information needed is in the index. Reading indexes is (usually) a lot faster than reading tables. So that's why the correlated subquery, in this instance, is supposedly faster.
I would still want to see proof that it is really faster on my live database. Other factors, such as the size of the tables might make more of a difference than how your query is structured. And again, what works on Oracle, for example, might not work the same way in MySql or DB2.
All of the world's problems can be solved in a garden - Geoff Lawton. Tiny ad:
RavenDB is an Open Source NoSQL Database that’s fully transactional (ACID) across your database