• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Batching Select Statements in JDBC

 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24212
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The October issue of the JavaRanch Journal has just been posted, and our own Jeanne Boyarsky's written a a rootin' tootin' new article! If'n you've got somethin' to say about it, this here's the place!
 
Andreas Ecker
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

just curiousity: why is it good to use batch sizes, that are relativley prime?
Why will this give you less server roundtrips?

Andreas
 
Amy Smith
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Great algorithm, thanks.

If you have the luxury of knowing values that are out of range for the column in the where, use the out of range values for the extra parameter values. This would let you handle the stragglers in one query.

IE. selecting by an id column that is always a positive number.
select ... from ... where ... in (?,?,?,?,?) and you are looking for "1,2,3". Use -1 for the other 2 parameters.

--Amy
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34851
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Andreas Ecker:
Hi,

just curiousity: why is it good to use batch sizes, that are relativley prime?
Why will this give you less server roundtrips?

Andreas

The short answer is mathematically it works out that way. Consider two extreme examples.

1) batch sizes of 1, 4 and 8:
If user wants 8 values, we use 8. - # queries: 1
If user wants 9 values, we use 8/1. - # queries: 2
If user wants 10 values, we use 8/1/1. - # queries: 3
If user wants 11 values, we use 8/1/1/1. - # queries: 4
If user wants 12 values, we use 8/4. - # queries: 2
If user wants 13 values, we use 8/4/1. - # queries: 3

2) batch sizes of 1, 4 and 9:
If user wants 8 values, we use 4/4. - # queries: 2
If user wants 9 values, we use 9. - # queries: 1
If user wants 10 values, we use 9/1. - # queries: 2
If user wants 11 values, we use 9/1/1. - # queries: 3
If user wants 12 values, we use 9/1/1/1. - # queries: 4
If user wants 13 values, we use 9/4. - # queries: 2

If you try this out with a lot of different numbers, it works out that there are more values that result in less queries with relatively prime numbers. This is mainly due to the effect shown for 8 values. We can use 4/4 instead of 8 - which gets us to higher values with less queries.

I'm sure this can be more mathematically explained. I used a combination of math and experimentation to originally figure it out.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34851
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Amy Smith:
Great algorithm, thanks.

If you have the luxury of knowing values that are out of range for the column in the where, use the out of range values for the extra parameter values. This would let you handle the stragglers in one query.

IE. selecting by an id column that is always a positive number.
select ... from ... where ... in (?,?,?,?,?) and you are looking for "1,2,3". Use -1 for the other 2 parameters.

--Amy
]
This is a very good and interesting point. You wouldn't actually need to know what values are out of range. You could just fill it in with the first value:
select ... from ... where ... in (1,2,3,1,1)

Just like everything else in the article, you would have to verify that it was indeed faster to do it this way. I suspect it would be though unless the query involved a lot of processing (subqueries and the like.) The column involved in the in clause is highly likely to have an index. So it would be comparing an index scan and disk accesses to the saved network roundtrips. The database is smart enough to only return the rows once. Of course if you did know of values that are out of range, it would be faster because you would just have the index scan.
 
Pradeep bhatt
Ranch Hand
Posts: 8933
Firefox Browser Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
stmt.setInt(1);


Is the above a typo ? The method signature should have a parameter index followed by the integer value.
 
Pradeep bhatt
Ranch Hand
Posts: 8933
Firefox Browser Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


I haven't understood this. In the above example there are 3 statements and setxxx methods invoked 3 times. If there 100 id are you suggesting that the "select" statement appear 100 times. May me I am missing something.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34851
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Pradip Bhat:
Is the above a typo ? The method signature should have a parameter index followed by the integer value.

Oops. Yes that is a typo. And I seem to have done that in a few places. That's what I get for coding without a compiler around...
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34851
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pradip,
Yes, if there 100 ids, the select statement would appear 100 times. This could also be written as

The idea is that you have the single statement "select id, name from users where id = ?;" that gets sent to the database multiple times in the same server roundtrip. It's only marginally better than doing the 100 statements in separate trips though. And logically more confusing!
 
Pradeep bhatt
Ranch Hand
Posts: 8933
Firefox Browser Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Jeanne. Why don't you write an article onn JDBCRowSet et al
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34851
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pradip,
Thomas Paul actually wrote an article about the RowSet. It's an excellent read!
 
Frank Brown
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nitpicking really, but... I think you meant:

for (int i=0; i < batchSize; i++) {
if ( firstValue ) {
firstValue = false;
} else {
inClause.append(',');
}
inClause.append('?');
}
... not this:
for (int i=0; i < batchSize; i++) {
inClause.append('?');
if ( firstValue ) {
firstValue = false;
} else {
inClause.append(',');
}
}
( so you get "?,?,?,...?,?" not "??,?,...?,?,")

... but, since you know batchSize, why not do this:
for (int i=0; i < batchSize - 1; i++) { inClause.append('?,') }
inClause.append('?');

But as Utah Phillips would say, "... it's good though!"
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34851
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Frank,
Yes. Good catch and thanks for the code review!
 
Reid M. Pinchback
Ranch Hand
Posts: 775
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Interesting technique and worth having in the toolkit. Goes beyond just "here is a neat idea" and has a sound implementation strategy that should work in production. Databases like Oracle cache the execution plans for the statements they've recently processed. I like that this technique sticks with bound parameters instead of literal values, and has a way of minimizing how many different types of statements it creates. The end result is that it looks less likely to cause Oracle to flush out all its good execution plans for a temporary glut of dynamic queries.

There is another technique that would be even faster. Definitely works on Oracle, would require a bit of adaptation on other database. If you create a temporary table, you can do a batch update of all your values (ids in this example) to the temporary table. Then all you need to do is a simple join in your query between the table of interest and the temporary table. End result; one batch of data goes in, one batch of data goes out.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34851
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Reid,
The technique was actually developed on db2 and migrated to Oracle. We knew that we would be migrating so cross-database compatibility was essential. And we are using it in production
 
David O'Meara
Rancher
Posts: 13459
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I know it's been a while, but I'm late to the party.

I must admit that I didn't fully understand this article when I first read it, but now I'm writing a routine to migrate table data between databases (it's a long story) and this came to mind.

After a re-read it's just what I want
 
Dan Geck
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi guys i found a simple way to do it I dont know if this is helpful enough.

first create string of SQL
privae String SQL = "SELECT * FROM table WHERE column_name IN ($)";

second create a method that will modify the string SQL and convert the "dollar sign" into strings or ids

private String fillINStatement(String sql, int[] ids){

String IN = "";
String comma = ",";
String newSql;
int i;

for(i=0; i < ids.length; i++){

if((ids.length - 1) == i){
comma = "";
}
IN+=ids[i]+comma;
}
// this will replace $ into string of ids
newSql = sql.replace("$", IN);

return newSql;
}

third call the method and pass the values

int[] ids = {1,2,3,4};
String myNewSql = this.fillINStatement(this.SQL, ids);

so the result of your myNewSql will be something like this.
SELECT * FROM table WHERE column_name IN (1,2,3,4);

i dont know if this make sense. but it works fine for me hope in some ways this script could help.
 
David O'Meara
Rancher
Posts: 13459
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nearly, but not quite.
You still need to use a PreparedStatement, which means you populate '?'s and then bind your values to them. Each time you call it with the same number of parameters, the database is able to reuse the compiled statement, saving work and time on each run.

Thanks for the code though, I plan on using it
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34851
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dan,
That works if you don't mind using up up prepared statement cache entries on the entry. If you only have 1 - 4 values (or a like small number), this isn't a problem. In my applications, I have an arbitrary number between 1 and (potentially) a few thousand. This means the other queries would wind up being booted from the cache. It also means I pass the maximum number of bound variables for some databases.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34851
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Upon re-reading your post, I realize the $ is converted into the string literals. Meaning it doesn't use a prepared statement binding variables at all. This opens up the possibility of SQL injection.
 
Campbell Ritchie
Sheriff
Pie
Posts: 49797
69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And welcome to JavaRanch, Dan
 
Dan Geck
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the warm welcome guys. Well I forgot to include that I'm using jdbcTemplate.query so yeah it's not applicable for PreparedStatement. I'm still a novice in Java though and still exploring things. Thanks for the comment guys..
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34851
369
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Dan Geck:
so yeah it's not applicable for PreparedStatement.

Ah! That was the premise of my article. Which explains the difference!
 
Bruce You
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is my Groovy implementation about this:

It's so good with Groovy!
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34851
369
Eclipse IDE Java VI Editor
 
Leonardo Carreira
Ranch Hand
Posts: 494
Eclipse IDE Java Postgres Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Friends..

iam just curious.. sorry if i was Out Of Topic...

How to implement Batching Select Statement in ORM like Hibernate?..

Thanks in advance..


 
Kunal Lakhani
Ranch Hand
Posts: 622
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using criteria and criterion. In Restrctions. class there is a in method which takes 2 args, string property and the collection.



 
Pierre Drapeau
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Another alternative that I found applicable in my case is to use a batch size containing at least the number of values that I want to pass, up to the maximum large batch size, and filling unset values (the remainder) with dummy values. This way, most of the time only one query runs, unless I exceed the larger batch size. Here I look for at least one returned value. If not, I would simply have to pile up the returned values query.neMaintPartition.

for instance:




And then, filling the query:






 
Douglas Lancy
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
From both a performance perspective and for simplification, wouldn't using a temporary table make more sense? Create the temporary table with the values for the IN clause and then use that as part of your Prepared Statement? Are there any issues or major performance concerns with this approach?

 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using temporary tables is certainly an option. Performancewise, I'd say it would greatly depend on the target database. Databases that use optimizer might need some prodding to generate optimal plans for these queries. I've chosen this approach in a project where the number of items in the IN clause varied greatly, with no reasonable upper bound on it.

... and welcome to the Ranch!
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic