• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Junilu Lacar
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Rob Spoor
  • Bear Bibeault
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Piet Souris
  • Carey Brown
  • Stephan van Hulst
Bartenders:
  • Frits Walraven
  • fred rosenberger
  • salvin francis

Update 150k records under load. Any problems?

 
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What problems in theory can I get when trying to increment a column by +1 in a MySQL 5.5 with Inno DB table (not a cluster) under load (about 150,000 records in the table)? It is important that the database at the time of updating can give data not much slower than usual. Thanks!

P.S. I probably mean that only the updated line should be blocked.
 
Saloon Keeper
Posts: 23881
162
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Pfft. Only 150000 rows? And all you want to do is "UPDATE X SET colname = colname + 1"?

Hopefully you've made that operation transactional so that no one attempts to read/update colname while the update is actually going on, and that can potentially delay other users. But 150000 rows on modern hardware and software isn't likely to take all that long. I do much worse routinely.
 
Sergei Prosvirnin
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:Pfft. Only 150000 rows? And all you want to do is "UPDATE X SET colname = colname + 1"?

Hopefully you've made that operation transactional so that no one attempts to read/update colname while the update is actually going on, and that can potentially delay other users. But 150000 rows on modern hardware and software isn't likely to take all that long. I do much worse routinely.



Thanks for your interest in this matter! But here the problem lies in the availability of the database for many clients during updates. In terms of execution time, such an update takes much longer than you think (more than 2 hours in the case of the solution below). I came to the conclusion that the best option for the load on the base and availability for many clients would be the option with the division of this task into 150 separate transactions
 
Tim Holloway
Saloon Keeper
Posts: 23881
162
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
TWO HOURS!!!???

I think you need to have a serious look at that server and that database. I wouldn't expect it to take more than 10 minutes absolute max. Two minutes more like it. Check the CPU and I/O stats and run an EXPLAIN.

I've done multi-gigabyte database operations in far less than two hours.

It's not that I expect everything I do to operate instantaneously - I used to have a system that when cranked up took about 10 hours to run through. But it was doing complex scoring algorthms that required a lot of webapp-side work. A simple "Add 1 to X" can be done entirely within the database server in a single SQL command and that shouldn't take nearly so long.

I know that the current statndard business model encourages brute-force solutions over analysis and tuning, but two hours sounds seriously out of whack to me.
 
Tim Holloway
Saloon Keeper
Posts: 23881
162
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also, transactional isn't the same thing as batching. If you break the operation into multiple transactions, you're going to have data integrity concerns just as much (in most cases) as if you hadn't done transactions at all.
 
Sergei Prosvirnin
Greenhorn
Posts: 27
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:TWO HOURS!!!???

I think you need to have a serious look at that server and that database. I wouldn't expect it to take more than 10 minutes absolute max. Two minutes more like it. Check the CPU and I/O stats and run an EXPLAIN.

I've done multi-gigabyte database operations in far less than two hours.

It's not that I expect everything I do to operate instantaneously - I used to have a system that when cranked up took about 10 hours to run through. But it was doing complex scoring algorthms that required a lot of webapp-side work. A simple "Add 1 to X" can be done entirely within the database server in a single SQL command and that shouldn't take nearly so long.

I know that the current statndard business model encourages brute-force solutions over analysis and tuning, but two hours sounds seriously out of whack to me.



See the same question on StackOverflow - https://stackoverflow.com/a/68397902/10074763
 
Sergei Prosvirnin
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:Also, transactional isn't the same thing as batching. If you break the operation into multiple transactions, you're going to have data integrity concerns just as much (in most cases) as if you hadn't done transactions at all.



Thanks for the answer! Can you explain why?
 
Tim Holloway
Saloon Keeper
Posts: 23881
162
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Here's why I worry about your database. I made a simple table of 150K rows. Here's how long it took to add a value to a column:


It didn't take 2 hours. It didn't take 2 minutes. In fact, it didn't even take 2 seconds! And this isn't some hyped-up supermachine I'm using. It's about 6 years old and the database server is just one of several VMs running inside the physical box.
 
Sergei Prosvirnin
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:Here's why I worry about your database. I made a simple table of 150K rows. Here's how long it took to add a value to a column:


It didn't take 2 hours. It didn't take 2 minutes. In fact, it didn't even take 2 seconds! And this isn't some hyped-up supermachine I'm using. It's about 6 years old and the database server is just one of several VMs running inside the physical box.



I added a link to an answer from StackOverflow in this thread, which explained that such a long time is a consequence of network latency. Can you explain how transactional processing is different from batch processing and how it can damage the database?
 
Sergei Prosvirnin
Greenhorn
Posts: 27
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My plan was to write a JUnit test that would use JPA/Hibernate to perform 150,000 separate transactions on a remote server.
 
Ranch Foreman
Posts: 135
5
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Sergei Prosvirnin wrote:such a long time is a consequence of network latency


That's why Tim sugested this:

Tim Holloway wrote:A simple "Add 1 to X" can be done entirely within the database server in a single SQL command and that shouldn't take nearly so long.


So the reason for that 2 hour processing seems to be you fetch a record from the database - update it - and store it back to the database - one-by-one. Although I don't like to phrase it this way, but: "Your approach is wrong." - or at least it seems so. This simple update should be done by the database server itself. Otherwise you have to transfer at least the data you want to update twice over the network - once to fetch them and a second time to store the updated result back. Even using 10GBit networking and a fast client - there're still several bottlenecks. Have the database server do the job on its own removes all of them - and makes the update quite more efficient.
 
Tim Holloway
Saloon Keeper
Posts: 23881
162
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
OK, let's see if I can explain.

First, the SQL UPDATE statement I used has no network latency. It is submitted directly to the MySQL (MariaDB) server, which digests the SQL command and turns it into database operations. Everything is done internally by the database server.

On the other hand, if I wrote a Java application using JPA and instead of submitting an SQL update command, I coded a loop like this:

That would be a fairly large amount of network latency potentially, since each record would have to be pulled from the database, sent over the network to my Java logic, updated, and then sent back over the network to update the database.

Don't mistake my example for valid JPA code. I made it JPA-like, not fully valid JPA for simplicity's sake. Still you can see the problem, I think.

If you were adding different values to each record, such as when batch-updating a set of bank account balances, you'd probably have to do it this way. But to add the same value to each record, it's definitely quicker to do it in server-side SQL and not in Java code.

OK. Batching is the process of collecting a bunch of operations and sending them off all at one time. Since each "send" operation has some overhead, a larger batch is often desirable to reduce the cost of that overhear.

Transactions, on the other hand, are to ensure data consistency and integrity. You can wrap single operations or batches of operations in a transaction, but they won't actually apply to the database until the transaction is committed. And if an error causes you to roll back instead of commit, the database doesn't change at all. Thus you can be assured that your database will be consistent and that you won't have problems where people are trying to look at or update a record while you're building up the transaction. They'll see the "before" image of the record, regardless of how many operations you have done, and then when you commit, they'll see the "after" image. JPA, incidentally, has mechanisms to detect when two different users try to modify the database at the same time.

 
I've been selected to go to the moon! All thanks to this tiny ad:
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
reply
    Bookmark Topic Watch Topic
  • New Topic