• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

How to execute join() as child threads end?

 
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I guess this is a basic question.

I'm trying (just for learning purposes) starting 30 threads to count the number of characters for 30 text files and then check how long it took to finish all of them (instead of counting in all files sequentially from one single thread).

The program seems to work but I'm not convinced that executing join() in the same order in which the threads were started is a good idea since it would be better execute join() when they have really finished since I have no clue have to do it.

This is the program:


And when I try it I get results like this:
con hilos
Fich09.txt --> cuenta: 53410800
Fich23.txt --> cuenta: 53335200
Fich10.txt --> cuenta: 53381400
Fich25.txt --> cuenta: 53335200
Fich22.txt --> cuenta: 53335200
Fich16.txt --> cuenta: 56655000
Fich04.txt --> cuenta: 70978400
Fich21.txt --> cuenta: 53335200
Fich30.txt --> cuenta: 56676000
Fich15.txt --> cuenta: 56655000
Fich27.txt --> cuenta: 56676000
Fich14.txt --> cuenta: 53360400
Fich03.txt --> cuenta: 53497950
Fich19.txt --> cuenta: 56676000
Fich28.txt --> cuenta: 106670400
Fich02.txt --> cuenta: 57833400
Fich06.txt --> cuenta: 70978400
Fich18.txt --> cuenta: 56650800
Fich11.txt --> cuenta: 53340805
Fich05.txt --> cuenta: 83310000
Fich08.txt --> cuenta: 53335200
Fich26.txt --> cuenta: 53335200
Fich17.txt --> cuenta: 56659200
Fich12.txt --> cuenta: 106792200
Fich24.txt --> cuenta: 53335200
Fich20.txt --> cuenta: 109986000
Fich07.txt --> cuenta: 106681610
Fich01.txt --> cuenta: 110044815
Fich13.txt --> cuenta: 110061600
Fich29.txt --> cuenta: 109996850
Tiempo: 1664 ms



Sometimes the time is noticeably shorter than without using threads, but other times is longer. I think the problem is because the wait for join() calls can be random.
Is there a way that threads let the main one know all of them have finished so that it can check when they ended? Or is there a way to execute the join() calls as the threads really end?
 
Sheriff
Posts: 28365
99
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No, none of that is of any importance. The code you have will terminate after the longest-running thread finishes and no sooner. Think about it -- if Thread 0 is actually the last to finish, then your waiting loop will wait for it to finish and then quickly notice (in essentially zero time) that the others have already finished. This is true regardless of the finishing order of the threads. And the code can't terminate before the longest-running thread does.
 
Fernando Sanchez
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:No, none of that is of any importance. The code you have will terminate after the longest-running thread finishes and no sooner. Think about it -- if Thread 0 is actually the last to finish, then your waiting loop will wait for it to finish and then quickly notice (in essentially zero time) that the others have already finished. This is true regardless of the finishing order of the threads. And the code can't terminate before the longest-running thread does.



Thank you for the explanation, so one conclussion I should get is that the multithread alternative does not mean better performance in this case. Am I wrong?

When I try the single-thread program it always takes about between 2.5 and 2.9 seconds to end.
On the other hand the multithread alternative is very unpredictable ranging from 0.6 to 57.3 seconds. I don't understand why there can be so very different results.
 
Paul Clapham
Sheriff
Posts: 28365
99
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Fernando Sanchez wrote:Thank you for the explanation, so one conclussion I should get is that the multithread alternative does not mean better performance in this case. Am I wrong?

When I try the single-thread program it always takes about between 2.5 and 2.9 seconds to end.
On the other hand the multithread alternative is very unpredictable ranging from 0.6 to 57.3 seconds. I don't understand why there can be so very different results.



You shouldn't assume that running a process in multiple threads will automatically run the process faster. It may happen that the threads interfere with each other in some way, or that they cause some problem like running out of resources. Just for example. I notice that your multithreaded version does run faster than the singlethreaded version sometimes, but I have no idea why it sometimes takes much longer. These things can take a long time to investigate.

So far you only output the number of characters processed by each thread. You might find the time each thread takes to run an interesting piece of information.
 
Fernando Sanchez
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:
You shouldn't assume that running a process in multiple threads will automatically run the process faster. It may happen that the threads interfere with each other in some way, or that they cause some problem like running out of resources. Just for example. I notice that your multithreaded version does run faster than the singlethreaded version sometimes, but I have no idea why it sometimes takes much longer. These things can take a long time to investigate.

So far you only output the number of characters processed by each thread. You might find the time each thread takes to run an interesting piece of information.



Thank you again.

Two very diiferent executions after adding time traces:

One really fast:

con hilos
Fich22.txt --> cuenta: 53335200 Tiempo (ms): 303
Fich08.txt --> cuenta: 53335200 Tiempo (ms): 384
Fich21.txt --> cuenta: 53335200 Tiempo (ms): 362
Fich11.txt --> cuenta: 53340805 Tiempo (ms): 399
Fich30.txt --> cuenta: 56676000 Tiempo (ms): 382
Fich03.txt --> cuenta: 53497950 Tiempo (ms): 429
Fich23.txt --> cuenta: 53335200 Tiempo (ms): 367
Fich10.txt --> cuenta: 53381400 Tiempo (ms): 448
Fich06.txt --> cuenta: 70978400 Tiempo (ms): 454
Fich27.txt --> cuenta: 56676000 Tiempo (ms): 426
Fich19.txt --> cuenta: 56676000 Tiempo (ms): 419
Fich14.txt --> cuenta: 53360400 Tiempo (ms): 461
Fich24.txt --> cuenta: 53335200 Tiempo (ms): 434
Fich17.txt --> cuenta: 56659200 Tiempo (ms): 439
Fich25.txt --> cuenta: 53335200 Tiempo (ms): 416
Fich18.txt --> cuenta: 56650800 Tiempo (ms): 453
Fich04.txt --> cuenta: 70978400 Tiempo (ms): 493
Fich15.txt --> cuenta: 56655000 Tiempo (ms): 475
Fich26.txt --> cuenta: 53335200 Tiempo (ms): 450
Fich09.txt --> cuenta: 53410800 Tiempo (ms): 499
Fich16.txt --> cuenta: 56655000 Tiempo (ms): 477
Fich02.txt --> cuenta: 57833400 Tiempo (ms): 516
Fich05.txt --> cuenta: 83310000 Tiempo (ms): 532
Fich29.txt --> cuenta: 109996850 Tiempo (ms): 544
Fich12.txt --> cuenta: 106792200 Tiempo (ms): 580
Fich28.txt --> cuenta: 106670400 Tiempo (ms): 549
Fich20.txt --> cuenta: 109986000 Tiempo (ms): 538
Fich13.txt --> cuenta: 110061600 Tiempo (ms): 584
Fich07.txt --> cuenta: 106681610 Tiempo (ms): 596
Fich01.txt --> cuenta: 110044815 Tiempo (ms): 596
Tiempo: 596 ms



This one is really slow:

con hilos
Fich25.txt --> cuenta: 53335200 Tiempo (ms): 316
Fich11.txt --> cuenta: 53340805 Tiempo (ms): 928
Fich08.txt --> cuenta: 53335200 Tiempo (ms): 966
Fich19.txt --> cuenta: 56676000 Tiempo (ms): 1221
Fich02.txt --> cuenta: 57833400 Tiempo (ms): 3130
Fich30.txt --> cuenta: 56676000 Tiempo (ms): 3786
Fich24.txt --> cuenta: 53335200 Tiempo (ms): 9536
Fich26.txt --> cuenta: 53335200 Tiempo (ms): 9684
Fich22.txt --> cuenta: 53335200 Tiempo (ms): 10461
Fich14.txt --> cuenta: 53360400 Tiempo (ms): 11606
Fich21.txt --> cuenta: 53335200 Tiempo (ms): 13373
Fich18.txt --> cuenta: 56650800 Tiempo (ms): 21601
Fich10.txt --> cuenta: 53381400 Tiempo (ms): 24965
Fich17.txt --> cuenta: 56659200 Tiempo (ms): 25545
Fich15.txt --> cuenta: 56655000 Tiempo (ms): 26892
Fich23.txt --> cuenta: 53335200 Tiempo (ms): 27036
Fich16.txt --> cuenta: 56655000 Tiempo (ms): 27591
Fich04.txt --> cuenta: 70978400 Tiempo (ms): 30947
Fich09.txt --> cuenta: 53410800 Tiempo (ms): 30951
Fich03.txt --> cuenta: 53497950 Tiempo (ms): 34843
Fich06.txt --> cuenta: 70978400 Tiempo (ms): 38185
Fich27.txt --> cuenta: 56676000 Tiempo (ms): 39614
Fich05.txt --> cuenta: 83310000 Tiempo (ms): 48469
Fich01.txt --> cuenta: 110044815 Tiempo (ms): 48973
Fich07.txt --> cuenta: 106681610 Tiempo (ms): 49377
Fich13.txt --> cuenta: 110061600 Tiempo (ms): 52618
Fich20.txt --> cuenta: 109986000 Tiempo (ms): 53745
Fich28.txt --> cuenta: 106670400 Tiempo (ms): 54569
Fich12.txt --> cuenta: 106792200 Tiempo (ms): 55007
Fich29.txt --> cuenta: 109996850 Tiempo (ms): 55014
Tiempo: 55078 ms


 
Paul Clapham
Sheriff
Posts: 28365
99
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That's interesting. It looks like the threads are running one at a time, approximately. As if they were waiting to take turns using some resource. No idea what that might be, but reading a file from the file system uses a lot of internal objects including the operating system and its files and Java's internal classes which work with them.

Anyway your Thread code seems to be sound. However, people haven't been using Thread objects to write new multithreaded programs for a long time now. The new and improved tool is called ExecutorService and it's a lot more flexible. Thread is fine to learn how multithreading works, though.
 
Bartender
Posts: 15737
368
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I too suspect it has to do with the way the operating system handles files.

Reading a file with more than 50 million lines in 300 ms is very suspicious to me. I believe the file contents may still be in cache, and therefore you get very fast execution times. The threads also don't have to wait on each other, because the file cache can be accessed concurrently.

On the other hand, when the files aren't in cache, the threads might have to wait on each other because depending on your OS and hardware configuration, read operations might not be handled concurrently.

So, while multithreading CAN improve file reading performance, you must never rely on it.

I bet you can replicate your fast times consistently by replacing the file reading operation with an in-memory operation, such as sorting a large array.
 
Without deviation from the norm, progress is not possible - Zappa. Tiny ad:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic