Hi All, I am facing a huge performance hit in the java code when using "where" clause in queries. Following are the details:
1. SELECT * FROM Employee
2. SELECT * FROM Employee where employeeid in (26,200,330,571,618,945)
There is no difference in Query Execution Time for both queries. Business Logic Time is huge in second case as compared to first one (ratio - 1:20). Rows returned are more in first case as compared to second case.(ratio - 1:4)
Business Logic is same for both the cases where I iterate through the ResultSet, get the objects and set them in a data structure.
Does anybody know the reason of unexpected time difference for the business logic in the second case?
[ February 27, 2006: Message edited by: Raj Murthi ] [ February 27, 2006: Message edited by: Raj Murthi ]
Query times are dominated by the amount of data searched through to find your rows.
If you don't have any indexes on your EMPLOYEE table, then find 1 particular row is going to be about as much work for the databse as finding all rows, because the databsae has to look through all rows to find the 1. You can reduce the amount of work the database does by adding appropriate indexes, for example:
Appropriate indexes can dramatically speed up query times. However, they are not free. Besides taking up additional storage space in the database, which can sometimes be a very significant fraction, they will also slightly slow down INSERT and sometimes UPDATE statements.
1. It seems to me, only a really dumb DB engine should care that the index has a secondary column... the column specified in the WHERE clause is first in the index, which means the index (an ordered set of row pointers) is ordered first on that column, then (where the first column gives duplicates) on another column.
2. If there are not an appreciable number of rows in the database, the engine may (rightly) not bother with the index.... it is sometimes faster to do a sequential table scan than to hop back and forth from a small index to a small table.
3. The same point applies as in #2 if the database thinks there are not many rows in the table... in order for it to make an informed decision about the use (or not) of the index, it has to have up-to-date statistics. For Oracle, that means you need to analyze the table. For Informix, it's "update statistics". For MS SQL, it's probably control-F1-right-click-the-pretty-button.
4. Sometimes, the "value in (val1, val2, val3)" can be enough to convince an engine to skip the index - I'm not clear as to why, if the list is small, the engine doesn't just translate it to the equivalent "(value = val1 or value = val2 or value = val3)", which has sometimes (anecdotaly) seemed to use an index when the IN clause has not.
5. Some (most? I don't know...) DBs have an option that will give you some debugging info about how the DB engine will process the statement, along with an estimated "cost". The decision for a certain approach over another is based on the cost, and the cost is based on the accuracy of the statistics.... see #3.
Then there's the other point, based on the original post:
There is no difference in Query Execution Time for both queries.
... then, why are we all discussing query execution time?
Business Logic Time is huge in second case as compared to first one (ratio - 1:20). Rows returned are more in first case as compared to second case.(ratio - 1:4)
6. Unless the problem is truly in the business logic (whatever the real code is in place of STORE_ARRAY[x][y] = rs.getObject(i)), I can't see any reason why returning fewer rows should result in longer operation iterating through the ResultSet. If anything, the query execution time should differ, and the larger dataset should result in more "pages" from the DB, but the smaller should run faster in the java code. Can you verify that is really what you meant to say? That the smaller result set is actually taking 20 times as long to be processed in the java code?
Thanks Jeanne and Jon. This is really becoming interesting. Jon - I found your reply very useful. Jeanne - I haven't created the single column index yet, just waiting on conclusion, which I am sure I will get.
Let me ask one more question which can make the issue crystal clear. Once the query gets executed with statement.executQuery() and Resultset object is returned.
1. Then what happens when Resultset.next() is executed. [Can anyone point me some documents,links] First I thought that rs.next is taking lot of time but that is not the case. 2. Now, what I found is rs.getObject(i) is taking lot of time in "in" clause query(2nd case) whereas in first case _rs.getObject(i)_ is not time consuming. [No of columns are 25] [I am using _jconn2_ driver]
The getObject method gives me different timings. There are 25 columns in total and 10 rows for "where" clause query[2nd case]. getObject for one column takes more time in first iteration and then in second iteration some other column is taking more time which took less time in 1st iteration. [Best of 3 executions]
Also the queries which I have given are really simple, since I can't give the actual ones. But one thing I can say is that I am executing some view and then using the "where" clause on the total results that are returned from the view. This may also have some effect. I am thinking on all possible things which can turn out as bottleneck. Thats why you all must be thinking that why this guy doesn't ask all questions at one go. But its getting difficult to find actual bottleneck.
[ March 02, 2006: Message edited by: Raj Murthi ] [ March 02, 2006: Message edited by: Raj Murthi ]