Let's say I have a person table. A person can have one to three products and it looks like this.
I should get all products for each person, then make aggregation logic with prices (not possible to do on sql level) and as a result I should get only one aggregation value for each person. Then I write aggregated data to the final table, which contains only two fields (person_id and aggregation_value, where person_id is a key)
How I solved my problem by now:
I use JdbcPagingItemReader to read all data from the person table in pages of n-size.
I query all distinct persons from the table like this:
Then in a custom ItemProcessor I get all data for each customer with JpaRepository
make some aggregation logic and write aggregated data with JdbcBatchItemWriter to the final table.
1. Two database queries: first - with JdbcPagingItemReader, where I get all distinct persons and
second - in the processor with JpaRepository, where I get all products for one person.
Is it possible to make only one query in the JdbcPagingItemReader ? (maybe with subquery)
2. Multithreading: My batch (step) runs in a multithreaded fashion, but while writing data to the final table I get an error from a database, that a record is already there... (aggregation value was calculated for this user already and was written to the table) I can't understand it, since I query only distinct persons in the reader. A possible solution would be creating a custom writer and wrap writing to db in try-catch block and simply ignore error messages from the db. But it is an awful solution... Why and where I get this concurrency problem and how I can fix it ?
3. Is the JdbcPagingItemReader a right solution in this case, maybe a better solution would be with JdbcCursorItemReader ?
4. My batch should run monthly and I should rewrite all data from the final table. And while doing this(rewriting), the old data should be accessible. Each user can have only one aggregation value in the final table, cause this table is used by different users and I can't have inconsistency of data here. Ideally I should do everything in one session: 1 - delete all data from the final table, 2 - insert all data to this table and then commit my changes. If something goes wrong, everything will be rolled back. **How can I do it here ?**
I have two solutions: The first solution is to create a temporary final table to which I can write new data with JdbcBatchItemWriter and then just replace all data in the final table with temp table. The second solution is to use a tasklet for this, for example:
I can do it in one session (firstly delete all rows from the final table, secondly write new data to the final table and if something goes wrong, all changes will be rolled back). But in my case I have 20 millions records... Is it a good idea to store them in memory (execution context) between steps ? (step1: reader -> processor (will write person objects to a list in execution context) -> NoOpItemWriter -> step2: tasklet (will read a list from the previous step), deletes old data from the final table and wirtes all data from the list to the database)
Which solution is better ? Any better ideas how I can do it ?
5. I don't like my solutions at all ... this batch works, but it is not what I want to see as a final result. Any ideas how can I solve this problem better than now ?