• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Efficient way to calculate the number of threads

 
Mohammad Norouzi
Ranch Hand
Posts: 71
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all
We have a framework in which number of batch jobs will be created and each batch job may process a number of records.
in config file we can give maximum number of jobs (threads) and there is an interface in which we should specify the number of jobs and determine records for each job.
Now I want to know how to efficiently calculate these values.

Consider the maximum number of threads is 100 and we have 100 records I don't want the batch controller create 100 threads and assign 1 record to each. I also thought that we can give a suggested number of records for each thread say 20, now if we have 100 records we can calculate 5 threads each with 20 records but what if we have 30 records? then according to this kind of calculation we have 2 threads one with 20 records and the other with 10 records. but I think its better to have two thread each with 15 records.

I am looking for a formula in which we can compute the number of records and number of threads in a efficient way. It would be of great help of you share your experience or suggest any document or website for this.

thanks
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How about "maximum number of threads divided by number of jobs"? That should distribute them evenly. I'd suggest a number much smaller than 100, though; maybe 10.
 
Mohammad Norouzi
Ranch Hand
Posts: 71
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf Dittmer wrote:How about "maximum number of threads divided by number of jobs"? That should distribute them evenly. I'd suggest a number much smaller than 100, though; maybe 10.


Well sorry I didn't explain that number of threads is the same number of jobs. Actually each job runs in a separate thread and each thread (job) processes a number of records sequentially. This is a batch processing framework we are implemented in this way and the configuration values is in our hands to be changed.

thanks Ulf
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In that case, substitute "jobs" by "records".
 
Mohammad Norouzi
Ranch Hand
Posts: 71
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf Dittmer wrote:In that case, substitute "jobs" by "records".


well this is simplest and first solution that comes to mind, I'm looking for a more efficient formula. with 100 records in this case we have one job and with 200 records we have 2 jobs each job 100 record but I think for example having 5 jobs and 40 records for each job is much better

 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What I suggested was to distribute the records evenly over a fixed number of threads. How is what you propose different from that? Depending on the nature of the jobs, having a few threads/jobs (say, 5 or 10) work in parallel may improve concurrency. Having more than likely won't improve matters, and at some point will start to degrade performance.
 
Alan Mehio
Ranch Hand
Posts: 73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mohammad Norouzi wrote:
Ulf Dittmer wrote:In that case, substitute "jobs" by "records".


well this is simplest and first solution that comes to mind, I'm looking for a more efficient formula. with 100 records in this case we have one job and with 200 records we have 2 jobs each job 100 record but I think for example having 5 jobs and 40 records for each job is much better



What about the underlying operating system. How many processor do you have? You need to get the best concurrency possible right?
 
Matt Elliott
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
if i'm understanding the question correctly, you want to have the system autoconfigure itself for the best number of threads.
you should set the number of threads based on some multiple of Runtime.availableProcessors

Also you don't have to statically define the number of jobs per threads.
simply divide the number of jobs by the number of threads
take that number, and give each thread that many jobs.
then take the remainder, and loop around the threads adding one to each.

you're doing it wrong if you statically define things.


 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic