• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Performance hit using "where" clause in the query

 
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,
I am facing a huge performance hit in the java code when using "where" clause in queries. Following are the details:

1. SELECT * FROM Employee

2. SELECT * FROM Employee where employeeid in (26,200,330,571,618,945)

There is no difference in Query Execution Time for both queries.
Business Logic Time is huge in second case as compared to first one (ratio - 1:20).
Rows returned are more in first case as compared to second case.(ratio - 1:4)

Business Logic is same for both the cases where I iterate through the ResultSet, get the objects and set them in a data structure.

Does anybody know the reason of unexpected time difference for the business logic in the second case?

[ February 27, 2006: Message edited by: Raj Murthi ]
[ February 27, 2006: Message edited by: Raj Murthi ]
 
Bartender
Posts: 10336
Hibernate Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Show us the business logic.
 
Ranch Hand
Posts: 333
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Query times are dominated by the amount of data searched through to find your rows.

If you don't have any indexes on your EMPLOYEE table, then find 1 particular row is going to be about as much work for the databse as finding all rows, because the databsae has to look through all rows to find the 1. You can reduce the amount of work the database does by adding appropriate indexes, for example:


Appropriate indexes can dramatically speed up query times. However, they are not free. Besides taking up additional storage space in the database, which can sometimes be a very significant fraction, they will also slightly slow down INSERT and sometimes UPDATE statements.
 
Raj Murthi
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for replies.

Here is the sample code, this is not the actual buisness logic but it goes on same lines. [I am giving sample tables and columns here]

long t1 = System.currentTimeMillis();
ResultSet rs = stmtqry.executeQuery(SQL_QUERY);
System.out.println("Executing time for query: "+ ((System.currentTimeMillis()-t1))/1000.0);

t1 = System.currentTimeMillis();
while(rs.next())
{
for(int i=1;i<=NO_COLUMNS;i++)
STORE_ARRAY[ROW_COUNT][COLUMN_COUNT]=rs.getObject(i);
ROW_COUNT++;
}
System.out.println("buisness logic Execution time: "+ ((System.currentTimeMillis()-t1))/1000.0);

employeeid has non-clustered index, but its having composite index
create index employeeid1 on employee(employeeid,city)

So is the composite index the culprit, and should i go ahead and create one more index on employeeid. Also should it be clustered or non-clustered?
 
author & internet detective
Posts: 42103
933
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Raj,
Yes, you should add another index. The current one isn't being used since your query doesn't search by city too.

If I'm remembering right, you can only have one clustered index per table.
 
Ranch Hand
Posts: 84
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A few thoughts about the query execution time:

1. It seems to me, only a really dumb DB engine should care that the index has a secondary column... the column specified in the WHERE clause is first in the index, which means the index (an ordered set of row pointers) is ordered first on that column, then (where the first column gives duplicates) on another column.

2. If there are not an appreciable number of rows in the database, the engine may (rightly) not bother with the index.... it is sometimes faster to do a sequential table scan than to hop back and forth from a small index to a small table.

3. The same point applies as in #2 if the database thinks there are not many rows in the table... in order for it to make an informed decision about the use (or not) of the index, it has to have up-to-date statistics. For Oracle, that means you need to analyze the table. For Informix, it's "update statistics". For MS SQL, it's probably control-F1-right-click-the-pretty-button.

4. Sometimes, the "value in (val1, val2, val3)" can be enough to convince an engine to skip the index - I'm not clear as to why, if the list is small, the engine doesn't just translate it to the equivalent "(value = val1 or value = val2 or value = val3)", which has sometimes (anecdotaly) seemed to use an index when the IN clause has not.

5. Some (most? I don't know...) DBs have an option that will give you some debugging info about how the DB engine will process the statement, along with an estimated "cost". The decision for a certain approach over another is based on the cost, and the cost is based on the accuracy of the statistics.... see #3.


Then there's the other point, based on the original post:


There is no difference in Query Execution Time for both queries.



... then, why are we all discussing query execution time?


Business Logic Time is huge in second case as compared to first one (ratio - 1:20).
Rows returned are more in first case as compared to second case.(ratio - 1:4)



6. Unless the problem is truly in the business logic (whatever the real code is in place of STORE_ARRAY[x][y] = rs.getObject(i)), I can't see any reason why returning fewer rows should result in longer operation iterating through the ResultSet. If anything, the query execution time should differ, and the larger dataset should result in more "pages" from the DB, but the smaller should run faster in the java code. Can you verify that is really what you meant to say? That the smaller result set is actually taking 20 times as long to be processed in the java code?


Hope any part of this helps...
-- Jon
 
Raj Murthi
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Jeanne and Jon. This is really becoming interesting.
Jon - I found your reply very useful.
Jeanne - I haven't created the single column index yet, just waiting on conclusion, which I am sure I will get.

Let me ask one more question which can make the issue crystal clear.
Once the query gets executed with statement.executQuery() and Resultset object is returned.

1. Then what happens when Resultset.next() is executed. [Can anyone point me
some documents,links]
First I thought that rs.next is taking lot of time but that is not the
case.
2. Now, what I found is rs.getObject(i) is taking lot of time in "in" clause
query(2nd case) whereas in first case _rs.getObject(i)_ is not time
consuming.
[No of columns are 25]
[I am using _jconn2_ driver]
 
Ranch Hand
Posts: 2596
Android Firefox Browser Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Try analyzing a bit with query optimizer for RAW sql, if things are acceptable there look in the Java code. These things can get really interesting...and I think John's reply almost covers it all.

- Manish
 
Raj Murthi
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think in my previous reply I tried to explain where the bottleneck is.
Just to update I am using sybase database.I am trying to work with query optimizer.
 
Raj Murthi
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The getObject method gives me different timings. There are 25 columns in total and 10 rows for "where" clause query[2nd case].
getObject for one column takes more time in first iteration and then in second iteration some other column is taking more time which took less time in 1st iteration. [Best of 3 executions]

Also the queries which I have given are really simple, since I can't give the actual ones. But one thing I can say is that I am executing some view and then using the "where" clause on the total results that are returned from the view.
This may also have some effect. I am thinking on all possible things which can turn out as bottleneck. Thats why you all must be thinking that why this guy doesn't ask all questions at one go. But its getting difficult to find actual bottleneck.

[ March 02, 2006: Message edited by: Raj Murthi ]
[ March 02, 2006: Message edited by: Raj Murthi ]
 
Raj Murthi
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Any Updates on this front...
 
Hey! Wanna see my flashlight? It looks like this tiny ad:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic