Let's say I have a person table. A person can have one to three products and it looks like this.
I should get all products for each person, then make aggregation logic with prices (not possible to do on sql level) and as a result I should get only one aggregation value for each person. Then I write aggregated data to the final table, which contains only two fields (person_id and aggregation_value, where person_id is a key)
How I solved my problem by now:
I use JdbcPagingItemReader to read all data from the person table in pages of n-size.
I query all distinct persons from the table like this:
Then in a custom ItemProcessor I get all data for each customer with JpaRepository
make some aggregation logic and write aggregated data with JdbcBatchItemWriter to the final table.
1. Two database queries: first - with JdbcPagingItemReader, where I get all distinct persons and
second - in the processor with JpaRepository, where I get all products for one person.
Is it possible to make only one query in the JdbcPagingItemReader ? (maybe with subquery)
2. Multithreading: My batch (step) runs in a multithreaded fashion, but while writing data to the final table I get an error from a database, that a record is already there... (aggregation value was calculated for this user already and was written to the table) I can't understand it, since I query only distinct persons in the reader. A possible solution would be creating a custom writer and wrap writing to db in try-catch block and simply ignore error messages from the db. But it is an awful solution... Why and where I get this concurrency problem and how I can fix it ?
3. Is the JdbcPagingItemReader a right solution in this case, maybe a better solution would be with JdbcCursorItemReader ?
4. My batch should run monthly and I should rewrite all data from the final table. And while doing this(rewriting), the old data should be accessible. Each user can have only one aggregation value in the final table, cause this table is used by different users and I can't have inconsistency of data here. Ideally I should do everything in one session: 1 - delete all data from the final table, 2 - insert all data to this table and then commit my changes. If something goes wrong, everything will be rolled back. **How can I do it here ?**
I have two solutions: The first solution is to create a temporary final table to which I can write new data with JdbcBatchItemWriter and then just replace all data in the final table with temp table. The second solution is to use a tasklet for this, for example:
I can do it in one session (firstly delete all rows from the final table, secondly write new data to the final table and if something goes wrong, all changes will be rolled back). But in my case I have 20 millions records... Is it a good idea to store them in memory (execution context) between steps ? (step1: reader -> processor (will write person objects to a list in execution context) -> NoOpItemWriter -> step2: tasklet (will read a list from the previous step), deletes old data from the final table and wirtes all data from the list to the database)
Which solution is better ? Any better ideas how I can do it ?
5. I don't like my solutions at all ... this batch works, but it is not what I want to see as a final result. Any ideas how can I solve this problem better than now ?
I'd have to sit down and draw a map to understand all that, but just from superficial reading, I'd say that you probably don't know about database joins. An SQL join (usually an INNER JOIN) allows you to do things like receive a result set where the rows returned consist of the parent (outer) table row associated with foreign matches to the inner (child) table.
As for multi-threading, that can usually be controlled by using database transactions.
Some people, when well-known sources tell them that fire will burn them, don't put their hands in the fire.
Some people, being skeptical, will put their hands in the fire, get burned, and learn not to put their hands in the fire.
And some people, believing that they know better than well-known sources, will claim it's a lie, put their hands in the fire, and continue to scream it's a lie even as their hands burn down to charred stumps.
catch it before it slithers away! Oh wait, it's a tiny ad:
SKIP - a book about connecting industrious people with elderly land owners