I need to do the following task, using JDBC (I can't use database triggers or constraints with cascade-delete)
As you can see the result of the first query is used again and again. Is there a way to cache the result and use it in the subsequent statements?
Of course, I can use an ArrayList to store the MESSAGE_IDs returned in (a), but there are 2 problems with the approach
i) The number of rows returned is huge and so it storing it in a ArrayList and retriving it [I]might[/] affect the performance ii) I have to format the ArrayList entries in a way that it can be passed into the subsequent queries. (I have to fetch the individual elements in the ArrayList, append them all with a comma inbetween them...). Looks awkward to me.
Just wondering, is there a way to do it elegantly?
Mani, In this case, the database is likely to be the performance bottleneck, not the ArrayList. I would prefer storing the values in an ArrayList so you can close the connection/statement/resultset sooner. The deletes may take a while and you don't want to consume unnecessary database resources all that time.
Also look into batching those delete statements. That way you have less trips to the database. Further, not that using the in clause you don't get the value of prepared statements. You might want to try doing a fixed number (say 10) in each in clause so the sql looks the same each time.
did you consider to delete data using only one SQL statement or you need those ids retrieved by first SQL statement?
posted 15 years ago
Thanks for the replies
Blake: The suggested approach will delete records one by one. Assuming there are million rows in the table, the executeUpdate will be called 4 million times to delete the records from 4 tables. Won't that be a performance hit?
will perform better than
Jeanne: I understand that the database will be the bottleneck when compared to the ArrayList manipulation. That's the reason why I too prefer the ArrayList over the method suggested by Blake. But just wanted to know whether there is a better way of doing this Batching - yes. I will definitely go for it. And, I'm not clear what you mean by "doing a fixed number in each clause". Could you elaborate on it please. Are you suggesting some thing like this
Bori?a (Sorry...my browser doesn't display your name properly) No. I don't need the retrieved MESSAGE_IDs anymore.
posted 15 years ago
The suggested approach will delete records one by one. Assuming there are million rows in the table, the executeUpdate will be called 4 million times to delete the records from 4 tables. Won't that be a performance hit?
Correct, my approach is appropriate if you are deleting a small number of records. It's hard to imagine a business process whereby you would be deleting a million parent rows, but if that's the case then you need to consider something else. At that point I don't think you even want to do small batches of 10 100,000 times. Either way, your db is doing the same work.
Blake Minghelli<br />SCWCD<br /> <br />"I'd put a quote here but I'm a non-conformist"
Mani, Yes, you have the idea of what I was going for.
If you always used the number of results as returned from the original query, you would have a different number of parameters in the in clause each time. For example, you could have 102, 200 and 201 in clause entries for 3 different queries. The database would recognize this as 3 different queries and would not use the cached execution plan for similar queries.
If you always use a fixed number of parameters in the in clause, the database will use the prepared statement which will reduce some overhead. You could pick a couple of fixed values too. For example, batches for 1, 25 and 200. You can experiment to come up with good sizes for your dataset.