• Post Reply Bookmark Topic Watch Topic
  • New Topic

Data conversion in multiple threads

 
Walter Andresen
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi there!

I need to implement a converter that loads users from a DB and then invokes vendor's web services for each user. Number of users is around 100K. Each web service call takes about 1.5 sec. Time that I have to convert all users is 1 hour.

Is there any way to do this?

Are there any approaches or design patterns to implement a converter like this?

I believe I need to split data into chunks and process them in multiple threads. But how to better do this?

Also, I had some OOM issues when I spawned 15 threads in a similar conversion before. Is there any way to avoid OOM?

Thank you
 
Tushar Goel
Ranch Hand
Posts: 931
4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why you want to split the data? In my opinion if you handle each user request to separate thread then it would be easier.
 
Walter Andresen
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tushar Goel wrote:Why you want to split the data? In my opinion if you handle each user request to separate thread then it would be easier.


I think, you didn't get what I need. I have a table with some entities (users or patients or employees), for each record in this table I need to invoke a webservice to convert this data to a different system. If to do it in 1 thread it would take too much time. Obviously, I need to split data in this tables into chunks and execute convert each chunk in a separate thread.

so I was wondering if there are any design approaches or patterns for this? how to better split it into chunks to execute each chunk in a separate thread?
 
Anayonkar Shivalkar
Bartender
Posts: 1558
5
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Walter,

I think what Tushar mentioned should work. You can create a set of primary keys of that table and then run multiple web services in parallel (set of primary key - so that no two threads will work on same record). Of course, you can use any other mechanism to ensure this.

I hope this helps.
 
Henry Wong
author
Sheriff
Posts: 22530
109
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I am not convinced that it will be that easy -- after all we are talking about from doing 2400 transactions in an hour to 100000 transactions in an hour. That's is an increase in load for 42 times!!

In my opinion, at best, there will be a latency change in the web service to more than 1.5 seconds per call (which may mean the load has to be increased again). And at worst, the web service will no longer report an answer (responding with an error, or not responding at all).

Henry
 
Walter Andresen
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, are there any other options? If to create a set of primary keys and process them in separate threads, how to better do this? Can you elaborate the design a little bit? like I will create a class that implements Runnable interface and call the webservice inside run method. Then I will run it on a pool of threads?

Then how to split primary key into sets? what to do if I get an error and I need to re-run the processing?

Thank you for you advices
 
Anayonkar Shivalkar
Bartender
Posts: 1558
5
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Walter,

You may do something like:
1) Create a DO class for primary key (if your primary key is single column, then no need for this).
2) Create a set of primary key
3) Create an implementation of callable - this will actually call web service based on input (i.e. primary key) and will return output
4) Create an executor service - to which you can submit multiple tasks (each task will execute single web service)

With the help of Callable, you can access return value - based on which you can check if there was any error (and if that specific task needs to be executed again).

I hope this helps.
 
Manish Sridharan
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Walter,

Question for you : Is this one hour limit is because target system provides window for one hour for accepting the converted data
or is this because of source system which allows picking of data for only one hour.
or you want to do whole of the thing in one hour window ?


If first is the case, you can split the work in two parts
part 1 : create multiple threads to read data (use some logic to split data between threads) and once conversion is done, keep the converted data in DB or some filesystem

part 2 : when the target system one hour window is available then start sending converted data to target system. You can send the data to target system in batches or again by splitting converted data between multiple http threads or the combination of both.
I assume target system is able to cope with multiple http requests.

If second is the case, you can split the work in two parts
part 1 : pick all data from source system using multiple threads. Ofcourse you need split data between the threads. However, this time do not do the data conversion at this time. Just read and dump it in some db table or files.
part 2 : From the dump table, you can read the data using the threads, convert it and then send it target system.


If third is the case, you can still use the first solution, difference will be number of threads will be high.

What kind of webservice you are using? is it xml based or json based ? json based services are bit faster.

is this webservice is for sending the data to target system or reading the data from Source system ? 1.5 seconds is kind of slow in IMHO.

how much data webservice can support in one call ?


 
Walter Andresen
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you for your answers!

I am thinking to split data in the DB by chunks. I will create a new "chunk" table which will have the following columns:
ID of a user
Chunk Number (e.g.1,2,3,4 - N)
Conversion Status (success|failure|not processed yet)
Error
Timestamp

So I will have chunk 1 - ID:0 to 1000, chunk 2 - ID:1001-2001, etc. So in the first step is I will manually calculate and populate this table with chunks (e.g. using a stored proc).

Then I will create Chunk class that implements Runnable or Callable interface and takes one parameters - chunk number
Then I will instantiate N number of Chunks and execute them using on a pool of thread (ExecutorService).
Now each Chunk will read IDs from "chunk" table based on chunk number, then invokes the webservice in a loop for each ID and updates status in the table after execution.

So in this way I will be able to re-run conversion only with failed records after the first pass is complete.

I still have the problem with 1 hr time frame. I don't want spawn more than 5 threads because otherwise I may have problems with OOM and etc.






 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!