• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
  • Mikalai Zaikin

How to improve Spring batch performance working on 8 million records.

Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I am trying to generate reports in my project using spring batch. I have more than 8 million records in my database. Earlier I have set commit-interval as 1 but after reading some articles, I have set commit-interval as 10000 and page-size as 10000 but still, it's taking more than 44 hours to generate the report. In every itertaion, It is taking 3 to 4 minutes to get the records, processes those records and writing in my CSV file.

Please help me, friends. maybe I am doing something wrong.


<job id="reportJob" xmlns="http://www.springframework.org/schema/batch">

<!-- master step, 10 threads (grid-size) -->
<step id="masterStep">
<partition step="slave" partitioner="rangePartitioner">
<handler grid-size="10" task-executor="taskExecutor" />
<next on="*" to="combineStep"/>

<step id="combineStep">
<chunk reader="multiResourceReader" writer="combineFlatFileItemWriter"
commit-interval="1" />
<next on="*" to="deleteFiles"/>

<step id="deleteFiles">
<tasklet ref="debitfileDeletingTasklet" />

<!-- each thread will run this job, with different stepExecutionContext
values. -->
<step id="slave" xmlns="http://www.springframework.org/schema/batch">
<chunk reader="pagingItemReader" writer="flatFileItemWriter"
processor="itemProcessor" commit-interval="10000" />

<bean id="rangePartitioner" class="com.test.RangePartitioner" />

<bean id="debittaskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
   <property name="corePoolSize" value="10" />
   <property name="maxPoolSize" value="10" />
   <property name="allowCoreThreadTimeOut" value="true" />

itemreader bean

<bean id="pagingItemReader"
<property name="dataSource" ref="gemsDataSource" />
<property name="queryProvider">
<property name="dataSource" ref="gemsDataSource" />
<property name="selectClause" value="SELECT * " />
<property name="fromClause" value="***QUERY****/>
<property name="whereClause" value="where rn between :fromId and :toId" />
<property name="sortKey" value="rn" />
<!-- Inject via the ExecutionContext in rangePartitioner -->
<property name="parameterValues">
<entry key="fromId" value="#{stepExecutionContext[fromId]}" />
<entry key="toId" value="#{stepExecutionContext[toId]}" />
<property name="pageSize" value="10000" />
<property name="rowMapper">
<bean class="com.hello.ItemRowMapper" />

Please let me know if there is any issue with my code.
Posts: 1361
IBM DB2 Netbeans IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would suggest you to analyze each phase - fetching the data, process the data, and writing out the CSV file separately. First of all, you need to discover the bottleneck. Moreover, you should provide more details (if you can) of what your code does, otherwise we can only guess in abstract.
    Bookmark Topic Watch Topic
  • New Topic