Win a copy of Java XML & JSON this week in the XML and Related Technologies forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Devaka Cooray
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Junilu Lacar
  • Paul Clapham
  • Knute Snortum
Saloon Keepers:
  • Ron McLeod
  • Tim Moores
  • Stephan van Hulst
  • salvin francis
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Frits Walraven
  • Ganesh Patekar

How can I achieve parallelism for insert operations under same transaction using Spring Transaction  RSS feed

 
Ranch Hand
Posts: 558
2
Hibernate Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello All,

       I have a requirement to perform clean insert (delete + insert), a huge number of records (close to 100K) per requests. For sake testing purpose, I'm testing my code with 10K. With 10K also, the operation is running for 30 secs, which is not acceptable. I'm doing some level of batch inserts provided by spring-data-JPA. However, the results are not satisfactory.

My code looks like below


I have also set below properties


   Looking at logs, I noticed that the insert operation is taking around 3 - 4 secs to save 2000 records, but not much on iteration. So I believe the time taken to read through the stream is not a bottleneck. But the inserts are. I also checked the logs and confirm that Spring is doing a batch of 40 inserts as per the property set.

I'm trying to see, if there is a way, I can improve the performance, by using multiple threads (say 2 threads) that would read from a blocking queue, and once accumulated say 2000 records, will call save. I hope, in theory, this may provide better results. But the problem is as I read, Spring manages Transactions at the thread level, and Transaction can not propagate across threads. But I need the whole operation (delete + insert) as atomic. I looked into few posts about Spring transaction management and could not get into the correct direction.

Is there a way I can achieve this kind of parallelism using Spring transactions? If Spring transactions is not the answer, are there any other techniques that can be used?

Thanks
Kumar  
 
Saloon Keeper
Posts: 9703
192
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, can you first show us what the Person class looks like, and tell us what indexes you maintain on the related tables?
 
Kumar Raja
Ranch Hand
Posts: 558
2
Hibernate Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I took Person class as an example here. But the Entity looks like as below



Unique constraint is defined on PERSON_DT and PERSON_ID and a sequence
 
Stephan van Hulst
Saloon Keeper
Posts: 9703
192
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Not saying that any of these is necessarily wrong, but I'm just curious:

Why do you declare a sequence generator / generated value, instead of relying on just @Id?

Why do you have both a surrogate ID as a person ID?

Why do you set the batch size to 40 while creating lists of 2000 elements?

Anyway, I don't think multiple threads are going to help you (note that it's very iffy to create new threads in the web application in the first place). Maybe you can use a network analyzer to see whether the individual batches take a long time to process by the database. At any rate, hopefully a database wizard on these boards can help you out.
 
Bartender
Posts: 1002
16
Java Linux Mac OS X Spring VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just to add that nothing runs faster than a well written stored procedure that will run within the database.  Not sure how this would work with your existing code.

I've also worked on systems in which the information is marked as "deleted", typically a boolean flag column in the table.  This then excludes the information from any future queries.  At a later date all rows that have been marked for deletion are then deleted and/or moved into an archive.  Of course all database queries need to consider this flag.
 
Bartender
Posts: 20307
110
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to caution about stored procedures. They can, indeed be very effective, but they do tend to lock you in to a particular vendor's database (and if you think that isn't a problem, you don't work for Amazon - they're moving the entire enterprise off Oracle). Also you can end up with cases where you have to "treasure hunt" code between apps and database. And source management for stored procedures historically wasn't as good as for regular code. So use stored procedures wisely.

Most likely the real bottleneck here is that when you add many records at a time, the database server goes crazy indexing them all. Adding the data is fairly fast, but often the indexes are balanced binary trees or something similar and the DB server ends up doing massive amounts of I/O rebalancing the trees with each record add.

I learned this the hard way almost the first day of my professional career and have been re-reminded occasionally ever since on a variety of platforms.

Common industry practice is to delete whatever indexes you have, load the table, then re-create the indexes. That way instead of death by 1000 cuts, you get one big hit whose duration is very much shorter than the sum of all the little indexing operations.

Of course, these days, a database might be continuously online and performance for queries would suffer while the indexes were removed, but traditionally you didn't dump that much data in while online systems were running.

It's also worth noting that almost all DBMS's have a vendor-supplied utility that can do batch loading. It may be faster than attempting the operation in Java code.
 
Peter Rooke
Bartender
Posts: 1002
16
Java Linux Mac OS X Spring VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
During a (spring) web presentation I heard about a [yet another] new spring project Spring Data R2DBC - seems to be what you were asking for.

I don't know anything much about this new project, and this new "reactive" [buzz] word was new to me but from the docs;

Part of the answer why R2DBC was created is the need for a non-blocking application stack to handle concurrency with a small number of threads and scale with fewer hardware resources. This need cannot be satisfied with reusing standardized relational database access APIs - namely JDBC – as JDBC is a fully blocking API. Attempts to compensate for blocking behavior with a ThreadPool are limited useful.


R2DBC Spring Docs
 
All of the world's problems can be solved in a garden - Geoff Lawton. Tiny ad:
RavenDB is an Open Source NoSQL Database that’s fully transactional (ACID) across your database
https://coderanch.com/t/704633/RavenDB-Open-Source-NoSQL-Database
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!