• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

join and in

 
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If I have two tables test1 and test2, test1 has column col1, col2, ..., test2 has column cola, colb,....
test1.col1 is related to test2.cola.
If I make two queries using join and using in:
select test1.col1, test1.col2 where test1.col1 = test2.cola;
select test1.col1, test1.col2 where test1.col1 in (select test2.cola).
I tried a few cases, they return the same results. I would like to know if this is true in general.
 
whippersnapper
Posts: 1843
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yep. And there's at least one more such option (without getting *too* crazy):
select test1.col1, test1.col2 from test1 where exists ( select 'x' from test2 where test2.cola = test1.col1 )
If you've got some sort of qualifications on your nondriving table (where test2.colz = 'ZZZ' etc.), there can be performance considerations. (In general the vast majority of SQL I do is ad hoc, so as a rule I almost never worry about SQL performance. And also I don't know much about SQL performance optimization, so I'm the wrong person to ask there. (At my work there's a performance optimization group that will tweak SQL for you.))
So heavy disclaimers on the following:
If you expect your query to return a large percentage of the rows on your driving table, favor joins.
If you expect your query to return a small percentage of the rows on your driving table, favor in/exists.


That said, I personally would favor the join over in/exists in these cases unless I had a good reason not to. Anybody looking at your query would understand the intent of the join.
 
Mike Yu
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Michael,
Thank you for your reply.
Can you explain a bit more about "qualifications" and "nondriving table"?
 
Ranch Hand
Posts: 168
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In my experience, FWIW, a JOIN is almost always better than a IN, even if there are qualifications on the driving table, and especially if the query returns only a few rows.
A lot faster. In instances where it matters, I see speed increases of 100x for the JOIN (This is on Oracle 8i. I have seen similar improvements in DB2).
 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Given SQL-92,
select
test1.col1, test1.col2
from
test1,test2
where test1.col1 = test2.cola;
is NOT the same as
select
test1.col1, test1.col2
from test1,test2
where test1.col1 in (select test2.cola from test2);
As a test case, suppose test1 has 7 rows with col1 having values 1 thru 7 and test2 has values in cola of 1,3,5,7,9.
The first case (equijoin) returns 4 rows (1,3,5,7).
The subselect returns 20 rows.
Syntax is VERY important when coding SQL statements. You need to realize what it is exactly that you are asking for.
Otherwise the results may not be quite what you expected.
COL1 COL2
------- ------------------------------
1 one
3 three
5 five
7 seven

4 Row(s) affected

COL1 COL2
------- ------------------------------
1 one
1 one
1 one
1 one
1 one
3 three
3 three
3 three
3 three
3 three
5 five
5 five
5 five
5 five
5 five
7 seven
7 seven
7 seven
7 seven
7 seven

20 Row(s) affected
 
Michael Zalewski
Ranch Hand
Posts: 168
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Paul Misoni:
Given SQL-92,
select
test1.col1, test1.col2
from
test1,test2
where test1.col1 = test2.cola;
is NOT the same as
select
test1.col1, test1.col2
from test1,test2
where test1.col1 in (select test2.cola from test2);


But it's a lot closer to

Which is what I believe Mike Yu meant when he said
"they return the same results".
You are right when you say the INNER JOIN and the correlated IN do not always produce identical results. But I think for two tables without using NULLs, and if SELECT [primary-key] FROM Table2 is in the IN clause, that they are in fact identical.
Someone might be able to provide a counter-example.
 
Michael Matola
whippersnapper
Posts: 1843
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Let me try to clarify what I was trying to say.
It was my contention that the SQL queries (1), (2), and (3) below produce identical results.
If that's not true, I'd glady stand corrected.
Queries (1) and (2) are based on Mike Yu's original two queries. Neither of Mike Yu's original versions had FROM clauses and therefore weren't valid SQL. It was my assumption -- and I guess I should have stated this in my first post -- that the FROM clause in the first query would be from test1 , test2 and the FROM clause in the second query would be from test1.
(1)

(2)


(3)

Paul Misoni responds with two queries, the first of which is identical to (1) above, and here's the second (which I'll call (4)). Paul's queries are also based on Mike Yu's, but I believe Paul and I have made different assumptions about the FROM clauses.
(4)

Paul provides sample data and sample results for his two queries. I have no issues with Paul's results given his queries.
Paul points out that queries (1) and (4) are not the same. I completely agree. They're not the same. Whoever said they are?
The difference between query (4) and query (2) is the FROM clause. (2)'s from clause only has one table -- test1. But (4)'s from clause has two tables -- test1 and test2 -- and there's no join condition on these two tables in the where clause, so the query produces the Cartesian product of the two tables (then limits the results based on the where clause).
(If you don't believe me that a Cartesian product is being generated, add columns from test2 to the select list of (4).)
As far as I can tell, SQL-92 vs. non-SQL-92 has nothing to do with any of this. (But would gladly be corrected if I'm mistaken.) The difference is in what tables we all envision in the FROM clause. Based on Michael Zalewski's post:

code:
--------------------------------------------------------------------------------
SELECT Test1.col1, Test1.col2 FROM Test1 WHERE Test1.col1 IN ( SELECT Test2.cola FROM Test2 )
--------------------------------------------------------------------------------
Which is what I believe Mike Yu meant when he said
"they return the same results".

I think all three Michaels taking part in this thread were thinking of the same query and results.
By the way, Paul, if you're going to bring up SQL-92, shouldn't you have used INNER JOIN and CROSS JOIN syntax in the where clause in your examples?
 
Michael Matola
whippersnapper
Posts: 1843
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Michael Zalewski:
You are right when you say the INNER JOIN and the correlated IN do not always produce identical results.


None of the INs that have been posted have correlated subqueries.

But I think for two tables without using NULLs, and if SELECT [primary-key] FROM Table2 is in the IN clause, that they are in fact identical.


Could you provide more detail on how NULLs or nonkey fields would affect the results?
 
Michael Matola
whippersnapper
Posts: 1843
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can you explain a bit more about "qualifications" and "nondriving table"?
By "qualifications" I meant, roughly, conditions in the where clause that serve to limit the number of rows returned (as opposed to conditions in the where clause that specify which tables are to be joined and/or how).
By "nondriving table" I meant, roughly, any table in the query that is not the most important (driving) table.
 
Mike Yu
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you all for your help.
Sorry that the from clause is missing in my original question. The queries should be like:
select test1.col1, test1.col2 from test1, test2 where test1.col1 = test2.cola;
select test1.col1, test1.col2 from test1 where test1.col1 in (select test2.cola from test2).
[ November 16, 2002: Message edited by: Mike Yu ]
 
Try 100 things. 2 will work out, but you will never know in advance which 2. This tiny ad might be one:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic