Does anyone know of any rules based ETL framework that can handle transformation of large amounts of data. The idea is that the transformation logic is specified in a DSL, and the ETL reads the rules in DSL and does the transformation. We would prefer the DSL to be maintained by business users, but it will be ok if the developers do it too. The idea is that we want the rules to be seperated from the Java code, so we can update the rules without doing a complete release.
The Pentaho Business Intelligence suite contains an ETL tool named "Kettle" (also known as Pentaho DI). The ETL rules are storable in an XML file and can be edited by non-programmers via a GUI editor app named "spoon".
Spoon is basically a drag/drop/drool UI where you select sources, destinations, and processing operations into a work area, configure them, and wire them together to make the transformation ruleset.
It is very performant. I have used it to populate databases with hundreds of millions of records at a shot, and that was just basic operation without exploiting its abilities to work with parallelized databases.
The one thing I don't like about it is that some of the processes are fairly non-intuitive. One of them, in fact, used to be Excel input, but I got so fed up with that one that I made modifications to the source code which have since become a permanent part of the Kettle system.
An IDE is no substitute for an Intelligent Developer.