• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Rob Spoor
  • Henry Wong
  • Liutauras Vilda
Saloon Keepers:
  • Tim Moores
  • Carey Brown
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
Bartenders:
  • Frits Walraven
  • Himai Minh
  • Jj Roberts

subquery in spring batch item reader and multithreading problems

 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Let's say I have a person table. A person can have one to three products and it looks like this.
idperson_idproductprice
1person1product31
2person1product22
3person1product110
4person2product111
5person2product214

I should get all products for each person, then make aggregation logic with prices (not possible to do on sql level) and as a result I should get only one aggregation value for each person. Then I write aggregated data to the final table, which contains only two fields (person_id and aggregation_value, where person_id is a key)
person_idaggregation_value
person1100
person2111
person393

How I solved my problem by now:
I use JdbcPagingItemReader to read all data from the person table in pages of n-size.
I query all distinct persons from the table like this:


Then in a custom ItemProcessor I get all data for each customer with JpaRepository


make some aggregation logic and write aggregated data with JdbcBatchItemWriter to the final table.
Problems:
1. Two database queries: first - with JdbcPagingItemReader, where I get all distinct persons and
second - in the processor with JpaRepository, where I get all products for one person.
Is it possible to make only one query in the JdbcPagingItemReader ? (maybe with subquery)
How ?
2. Multithreading: My batch (step) runs in a multithreaded fashion, but while writing data to the final table I get an error from a database, that a record is already there... (aggregation value was calculated for this user already and was written to the table) I can't understand it, since I query only distinct persons in the reader. A possible solution would be creating a custom writer and wrap writing to db in try-catch block and simply ignore error messages from the db. But it is an awful solution... Why and where I get this concurrency problem and how I can fix it ?
3. Is the JdbcPagingItemReader a right solution in this case, maybe a better solution would be with JdbcCursorItemReader ?
4. My batch should run monthly and I should rewrite all data from the final table. And while doing this(rewriting), the old data should be accessible. Each user can have only one aggregation value in the final table, cause this table is used by different users and I can't have inconsistency of data here. Ideally I should do everything in one session: 1 - delete all data from the final table, 2 - insert all data to this table and then commit my changes. If something goes wrong, everything will be rolled back. **How can I do it here ?**
I have two solutions: The first solution is to create a temporary final table to which I can write new data with JdbcBatchItemWriter and then just replace all data in the final table with temp table. The second solution is to use a tasklet for this, for example:
I can do it in one session (firstly delete all rows from the final table, secondly write new data to the final table and if something goes wrong, all changes will be rolled back). But in my case I have 20 millions records... Is it a good idea to store them in memory (execution context) between steps ? (step1: reader -> processor (will write person objects to a list in execution context) -> NoOpItemWriter -> step2: tasklet (will read a list from the previous step), deletes old data from the final table and wirtes all data from the list to the database)
Which solution is better ? Any better ideas how I can do it ?
5. I don't like my solutions at all ... this batch works, but it is not what I want to see as a final result. Any ideas how can I solve this problem better than now ?
 
Saloon Keeper
Posts: 23409
159
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'd have to sit down and draw a map to understand all that, but just from superficial reading, I'd say that you probably don't know about database joins. An SQL join (usually an INNER JOIN) allows you to do things like receive a result set where the rows returned consist of the parent (outer) table row associated with foreign matches to the inner (child) table.

As for multi-threading, that can usually be controlled by using database transactions.
 
catch it before it slithers away! Oh wait, it's a tiny ad:
SKIP - a book about connecting industrious people with elderly land owners
https://coderanch.com/t/skip-book
reply
    Bookmark Topic Watch Topic
  • New Topic