• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

To Spring Batch or ETL

 
Ranch Hand
Posts: 662
Eclipse IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
We are in solutioning phase of a project and we are at a place where we have daily files which are coming in, which has to be parsed, validated and loaded into the database.
The file size could be from 20MB to 200 MB. Now we wanted to use Spring Batch, but considering the file size, we also are thinking about an ETL tool to do the job.
I was just curious (and also to rule out the ETL option), what is the cap on the file size which can be processed via Spring Batch. Is there any limit?
Now the SLA to load the files vary from 1 hr to 5hrs (not in any order, so let's take the worst case 200MB-1 hr combination).
Has anyone used such a combination in their project? What is your advice? Any valuable suggestions/problems you faced while loading bulk files, please let me know.
 
ranger
Posts: 17347
11
Mac IntelliJ IDE Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There is no cap on the file size.

You can also do some tricks like Multi-threading, partitioning, and remoting jobs to increase performance if the out of the box single threaded chunk reading is to slow. But you would be surprised at the speed.

I highly recommend Spring Batch over an ETL tool, just because I think Spring Batch is easier to use, setup, and code to. And very powerful with a database to store executions of jobs and statistics.

Hope that helps

Mark
 
Ranch Hand
Posts: 859
IBM DB2 Chrome Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Depending on your needs ...

Spring Batch will not do the parsing for you. You will need to receive the files, process, validate etc..
Look into mule ESB also for automatic triggering on reception of files in certain folders/directories.

Also, for ETL, look at Talend, I believe it's open source and can transform all sorts of files.

WP
 
Arun Kumarr
Ranch Hand
Posts: 662
Eclipse IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you Mark and William. We did consider Talend and Kettle for the ETL Job.
Would it be fair logical assumption to consider the ETL Tools as --> Parsers + A spring batch like framework + UI?
Also, I believe ETL tools does a run-time configuration changes to field mappings, which is tough in Spring batch (code change, compile and deploy).
So when I have to take a call, I'll check if my changes in fields and field mappings are huge, then we would suggest to go ahead with the ETL tool, else we would prefer Spring Batch (my personal preference too).
 
Mark Spritzler
ranger
Posts: 17347
11
Mac IntelliJ IDE Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

William P O'Sullivan wrote:Depending on your needs ...

Spring Batch will not do the parsing for you. You will need to receive the files, process, validate etc..
Look into mule ESB also for automatic triggering on reception of files in certain folders/directories.

Also, for ETL, look at Talend, I believe it's open source and can transform all sorts of files.

WP



That's true, but not exactly correct. There are mechanisms in Spring Batch, built in, for reading and parsing files. You still have some code to write to do the mapping of what is in a line in the file and any domain object you have that represents it, but the mechanism to use a simple callback interface mapping to me is a huge gain, taking away all the pain points of parsing code.

Also, adding Spring Integration with Spring batch can add transforms and many more.

Mark
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic