I am having requirement to read data(some of the tables containing millions of records) from multiple tables(around 10 tables) in oracle then do some transformation and load this data onto cassandra. What would be the best way to do this.
Following is the approach that i can think of
1 Use apache camel for connecting to oracle, have separate camel routes to read data from each of the tables(if 10 tables then 10 routes will be defined). then drop these messages into kafka topics(data from each route will be dropped into separate kafka topic). Now there will be routes defined to read data from each of these kafka topics(10 consumers to consume from 10 kafka topics), do necessary transformation in batches using logic in processor(if required will execute processor in multi threaded mode) then connect to cassandra's keyspace and insert the data read.
Can you suggest me if this approach is a good way to perform data migration task between two databases when you have huge data. Will above approach have any scalability issues.
If there are any tweaks to be made to above approach or alternative frameworks best suited for this requirement please suggest.
Note: Source database can vary. In some cases it can be DB2
For any sizable extraction/transformation/load task I would recommend to use an ETL tool that is designed to do just that, rather than try to roll your own. Searching for oracle cassandra ETL tool will give you some starting pointers on that.
Doe, a deer, a female deer. Ray, a pockeful of sun. Me, a name, I call my tiny ad ...