Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

select vs count(*) - Performance Comparison

 
Jaikiran Pai
Marshal
Pie
Posts: 10447
227
IntelliJ IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
We have an application which involves querying the database regularly. Currently we are looking at improving the performance of the application. We have certain scenario where we have check on a table to see whether it contains a row for a given criteria. Consider the following table(just an example. The real table may be far more complex):

Person
---------
Id - Integer
First Name - Varchar
Last Name - Varchar
Occupation - Varchar

Consider the following scenario, we want to check whether there is a row for a given combination of firstname and lastname. If not present then, insert a row. Our current logic involves:

- Retrieve the resultset by firing: select * from Person where firstname='abc' and lastname='xyz'
- Check the size of the resultset. If size is zero then insert the row, else do nothing.

The number of rows in the table may be huge. Instead of this query, we are planning to use the following query, *to improve the performance*:

- select count(*) from Person where firstname='abc' and lastname='xyz'

Is this going to help in improving the performance. Is there any better way of handling this?
We are currently using MSSQL Server(But, i dont think it should matter which server we are using)
 
Paul Clapham
Sheriff
Posts: 21297
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you expect count(*) to be approximately 1 then I wouldn't think there would be any difference at all. But the answer to all questions of the form "I have a complex system that I don't understand and I want to know if A is faster than B" is always "Try it and see what happens".
 
Jaikiran Pai
Marshal
Pie
Posts: 10447
227
IntelliJ IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you expect count(*) to be approximately 1 then I wouldn't think there would be any difference at all.

Count can reach upto 100 records.

But the answer to all questions of the form "I have a complex system that I don't understand and I want to know if A is faster than B" is always "Try it and see what happens".


Surely, we are going to do that

To do that we need to change quite a bit of code, so just wanted to make sure that we are atleast trying out a right approach
 
Paul Clapham
Sheriff
Posts: 21297
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by jaikiran pai:
To do that we need to change quite a bit of code, so just wanted to make sure that we are atleast trying out a right approach
Really? Wouldn't it be easier to just write some small programs specifically for testing the options? I don't see them requiring more than about 25 lines of code with all exception-handling included.
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Both approaches would have to run the query for all the records even though you only care about existance. I would do the query as an existance check.

It depends on your server how to do that. What rdbms are you using?

Also, if you use the jamon proxy jdbc driver all of this will be timed automatically and you can compare perforamnce of the approaches. You can use the jamon jdbc proxy simply by chaging your jdbc url (i.e. no code changes). A link follows:

http://jamonapi.sourceforge.net/#WhatsNew22
 
Jaikiran Pai
Marshal
Pie
Posts: 10447
227
IntelliJ IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul, i will certainly try it out with a sample piece of code. Steve, thanks for that link. Will certainly have a look at it.
 
Kiran Joshi
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Probably you will get more benefit from something like the following

select * from Person
where firstname='abc' and lastname='xyz'
and rownum =1

This will return only the first row.
You can try it out.
 
Jaikiran Pai
Marshal
Pie
Posts: 10447
227
IntelliJ IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The rowcount is available only in Oracle.
 
Peter Rooke
Ranch Hand
Posts: 848
1
Java Linux Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Could you use a stored procedure, do you have a clustered index (on First Name and Last Name). Which database are you using?
 
jay ceebee
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
will there be instances when you will allow duplicates for firstname lastname combination? from what you mentioned, seems like you won't be allowing any duplicated firstname lastname combi. if this is the case, wouldn't it be simple to just add a unique constraint on both firstname and lastname combi, then simply let your program handle the exception thrown when inserting a duplicate entry.
 
Jason Liao
Ranch Hand
Posts: 59
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by jay ceebee:
if this is the case, wouldn't it be simple to just add a unique constraint on both firstname and lastname combi, then simply let your program handle the exception thrown when inserting a duplicate entry.


This is a better solution than stored procedure and works for most RDBMS.

 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34837
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
count(*) is almost always faster than select * as there is less network traffic involved. For one row the difference wouldn't be significant unless you have LOB fields in the table.

Currently we are looking at improving the performance of the application.

Before optimizing this SQL query, do you know that it is the bottleneck. I would think it is more likely that something else is. It's a pretty simple query that tweaking the result type is unlikely to help. If the actual query takes long, look into adding an index on the name fields.
 
Jaikiran Pai
Marshal
Pie
Posts: 10447
227
IntelliJ IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
count(*) is almost always faster than select * as there is less network traffic involved. For one row the difference wouldn't be significant unless you have LOB fields in the table.


We carried out a small test on a sample database and the count(*) proved to be a bit more efficient than select *, even though the difference was not very much. We had around 200 rows in that table.

Before optimizing this SQL query, do you know that it is the bottleneck. I would think it is more likely that something else is. It's a pretty simple query that tweaking the result type is unlikely to help.


The reason why we are trying to do something about this query is that we have the following logic in place:



As can be seen, we have loop which is executed around 60000 times(this is a variable value and 60000 is about the maximum). In this loop the logic implemented is to check for the existence of a row and if the row exists then nothing is done else some operation is carried out.

The result set size most of the times is around 100 and get retrieved as part of the above logic even though they never get used. So we thought it would be better to change the query to something else as we are only interested in knowing the existence of the row.

We have heard about IF EXISTS query in MSSQL but we would like to have it to be as generic as possible so that it does not fail on other databases. Thats the reason we thought about count(*) query.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34837
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jaikiran,
That makes sense. Since it is executed so many times, even a small query can become the bottleneck easily. And yes, it is good to prefer count(*) over isExists.
 
Jaikiran Pai
Marshal
Pie
Posts: 10447
227
IntelliJ IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Jeanne Boyarsky, and all others for sharing your thoughts on this.

-Jaikiran
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic