• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Spring batch - 1 million rows, one column - too much time

 
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Before a couple of days I was new to Spring Batch and I have done some reading.

We have a process written in batch that is supposed to read close to a million rows with only one column from a table and write it to a text file.

It points to ORACLE DB and the batch read size is default (10).

Driver used is oracle.jdbc.driver.OracleDriver.

ItemReader we have used is Spring's JdbcCursorItemReader and we have implemented ItemWriter, ItemStream, StepExecutionListener, InitializingBean for the writer.

We have a row mapper.

The whole process of read/write takes 5 hrs everyday which is not acceptable to the client.

When i check the logs I see that the time taken from the time a row is mapped till it is written (for each record) only 1 or 2 milliseconds are being used.

However the next row is mapped only after 15 milliseconds - what is Spring doing here in between? We want to optimize this.

Where should I start?

Please suggest, I would be grateful !
 
Ranch Hand
Posts: 608
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Whats you commit-interval? Why don't you post you config?
It shouldn't take so much time, I've seen batches which write close to a million records to the database in less than 30 mins. Read operation should be a lot faster.
 
Mustafa Garhi
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sam, I am sorry but I wouldn't be able to post the config.

However, I realized that my knowledge of commitInterval was completely wrong as I thought it to be related to DB commits.

After your post I tried the commitInterval and it has made the process faster for sure. How much - I am still seeing, will let you know.

I am assuming if I keep the commitInterval too high, it will keep items in memory which means heap issues and all. Right?

Thanks for that great help.
 
Saifuddin Merchant
Ranch Hand
Posts: 608
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Commits interval in the range of 100-1000 should be good.
Going as high as 10,000 should get you great performance without any issues.
 
Mustafa Garhi
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
All right Sam. Thanks again.

One more question - if I am fetching only one column from the DB, should i need a RowMapper?

I think I can save some time if I can get rid of the mapper somehow.

I have tried removing it and my tomcat throws an error stating about the mandatory mapper.

I am digging more into this, but would appreciate if you had any ideas.
 
Saifuddin Merchant
Ranch Hand
Posts: 608
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hard to say without really knowing what you are doing.
I normally disagree with micro-optimization.

Personally I don't think you would get better performance by doing away with the row-mapper. Framework classes usually have optimized setting (e.g. database Buffer size)
 
ranger
Posts: 17347
11
Mac IntelliJ IDE Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can implement your own ItemReader, and just use the JdbcTemplate and call one of the queryFor methods. If it is a int returned call queryForInt, if it is a String you have to call queryForString.

If you do that then you don't need the rowMapper. While I doubt you will see a great increase in performance, it still is very easy to quickly write.

Mustafa, as for as commitInterval. The way the Spring Batch works is called chunk processing. In the Read and process of a step it done as Item by Item, then the write portion gets a Collection where the collection has the number in the commit interval/chunk size. If I set that to 1000, it will read 1000 rows one by one, then pass each row to the RowMapper, after each read/process that Item is placed into the Collection, when that collection gets 1000 items, then the collection is sent to the Writer. The Writer writes the 1000 in a batch, then commits the transaction. Then we do the reading/processing again for the next 1000 Items. The reader does some caching of data until the Transaction is commited. So in your Oracle batch read size of 10, would mean that the reader will contact the database 100 times per chunk. Since the commit interval here is set to 1000. I would make the chunk size and batch read size in the database to be the same number. 1000 will be good, 10000 will also be fine too.

Mark
 
Saifuddin Merchant
Ranch Hand
Posts: 608
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Id like to add a little to what Mark said,

The reader is buffered and most likely the number of records read from the database in one go, would be independent of the commit count. It would depend on the Driver, Resultset setting and other JDBC related setting.

So why does the commit interval of one take longer time? There are two reasons

a) Like Mark explained below spring follows a chunk oriented approach. So at the end of every commit interval - it would save the resource being written too (save the file, commit the database). This take some time.

b) Spring also saves meta-data information at the end of every commit interval, namely the STEP_EXECUTION_CONTEXT and the Number of Rows read and processed. It needs to do so for re-start-ability of the batch job. This takes the major chunk of time in your case.

Commit intervals up to 1,000 is safe. Anything over, you hardly notice any performance gains. Spring folks recommend a commit interval in the range of 100.
 
Ranch Hand
Posts: 2108
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, Just want to add on a possible reason.

Maybe you can try if the sql query is optimum. Try execute it in toad and see how long it takes.

Some sql commands like 'order by' and much worse functions, can totally make your query crawl.
 
Mustafa Garhi
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks everyone. The posts have really helped me understand batch better.

I got commitInterval added to the code.

On a little more research of our tomcat's memory settings (1 GB max) and the fact that this process runs every night at 2:00 AM which means less load on server, i deduced that i did not need to worry much about high value of commitInterval overflowing the memory and set the value to 10000.

The process completed in 44 mins.

Should I lower this value? (I am sorry I didn't read Sam's latest post carefully about the ideal value of commitInterval).

Thanks again mates.


 
Mark Spritzler
ranger
Posts: 17347
11
Mac IntelliJ IDE Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can always test it out. Can you make a staging like server with the same settings talking to a temp database, and just try different intervals and see how it affect it.

Mark
 
Saifuddin Merchant
Ranch Hand
Posts: 608
Firefox Browser Spring Java
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mark Spritzler wrote:You can always test it out. Can you make a staging like server with the same settings talking to a temp database, and just try different intervals and see how it affect it.

Mark



Been there done that! 1000 or 10,000 makes no big difference to the time. Anything over 10,000 might slow things off (memory tradeoff).
 
Mustafa Garhi
Ranch Hand
Posts: 111
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
For one million records i just found difference of a few seconds between 1000 and 10,000.

We decided to stick to 10,000.

CommitInterval has done the trick.

Thanks everybody !!
 
Mark Spritzler
ranger
Posts: 17347
11
Mac IntelliJ IDE Spring
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mustafa Garhi wrote:For one million records i just found difference of a few seconds between 1000 and 10,000.

We decided to stick to 10,000.

CommitInterval has done the trick.

Thanks everybody !!



Cool. Also consider this, if it has any affect. If a step fails and you want to restart. It will restart on the last chunk that caused the job to fail and have to do the entire chunk. So that would be 10K or 1000 depending on your chunk size. So there is a trade of to higher or lower. Just pick the happy medium for success completion time with restarts.

Mark
 
Ranch Hand
Posts: 62
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Mark,

Can you please tell me, how much time its taken for processing 1 million records.

Because I am going to propose this framework to our project.

Waiting for your valuable reply.

Advance Thanks,
Surendra
 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I am trying to generate reports in my project using spring batch. I have more than 4 million records in my database. Earlier I have set commit-interval as 1 but after reading your response, I have set commit-interval as 100 and page-size as 100 but still, it's taking more than 2 days to pull the reports. It is taking 3 to 4 minutes to get the records and process the records and pasting in my CSV file. I have set commit-interval as 1000 also but it took more than 20  minutes to get the records and process the records and pasting in my CSV file.

Please help me, friends. maybe I am doing something wrong.

Jobs-Context.xml

<step id="slave" xmlns="http://www.springframework.org/schema/batch">

<chunk reader="pagingItemReader" writer="flatFileItemWriter"
processor="itemProcessor" commit-interval="100" />
</tasklet>
</step>

itemreader bean


<bean id="pagingItemReader"
class="org.springframework.batch.item.database.JdbcPagingItemReader"
scope="step">
<property name="dataSource" ref="gemsDataSource" />
<property name="queryProvider">
<bean
class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean">
<property name="dataSource" ref="gemsDataSource" />
<property name="selectClause" value="SELECT * " />
<property name="fromClause" value="***QUERY****/>
<property name="whereClause" value="where rn between :fromId and :toId" />
<property name="sortKey" value="rn" />
</bean>
</property>
<!-- Inject via the ExecutionContext in rangePartitioner -->
<property name="parameterValues">
<map>
<entry key="fromId" value="#{stepExecutionContext[fromId]}" />
<entry key="toId" value="#{stepExecutionContext[toId]}" />
</map>
</property>
<property name="pageSize" value="100" />
<property name="rowMapper">
<bean class="com.hello.ItemRowMapper" />
</property>
</bean>

Please let me know if there is any issue with my code.

 
reply
    Bookmark Topic Watch Topic
  • New Topic