This week's book giveaway is in the JavaScript forum.
We're giving away four copies of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js and have Paul Jensen on-line!
See this thread for details.
Win a copy of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Create a new thread for each entity in a collection.  RSS feed

 
Jiafan Zhou
Ranch Hand
Posts: 193
Fedora Linux Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I need some advice and suggestions from some Thread experts about the possibility of my algorithm for concurrent Threads. A newbie in concurrent Java programming. The algorithm is outlined as follows:

1. A Java Collection is defined containing various of entities.
2. Iterate through all the entities in this Collection with the current thread. And create a new thread for each entity in the Collection.
(Basically I want concurrent behaviour of all the entities defined in this Collection)
3. Wait for all the threads (created by having iterating through the Collection) finished.
4. Continue the current thread.

It seems feasible in my mind,

Thanks.
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24217
38
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is pretty simple to do; to wait for a group of threads you can loop over them, calling "join()" on each one. join() doesn't return until the thread has stopped, so your loop will complete as soon as all the threads have finished. What part of this do you need help with?
 
Jiafan Zhou
Ranch Hand
Posts: 193
Fedora Linux Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ernest,

- Does that mean I need to create a second Collection (e.g. ArrayList) to store all the threads that I have been created? (i.e. whenever a new thread is created, it will be added into the thread-Collection).
- If yes, then I need to iterate through all the threads in this thread-Collection and invoke a join() on each of them?

Thanks,
Jiafan
 
Chris Hurst
Ranch Hand
Posts: 443
3
C++ Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Make sure your threads are pooled, Java 5 can help here.

If you don't want to maintain a list of threads you could make each call back to a central thread controller on completion which you register with the worker threads on creation (I'm not saying that maintaining thread list is bad just there are other options if you need them) obviously in this case you would need a count or something to determine they are all finished. I would then use the on completion to return the thread to pool/start a new worker allocated from your collection.

I'd have a good look at the new Java 5/6 stuff like CyclicBarrier etc to see if they can help first.
 
Jiafan Zhou
Ranch Hand
Posts: 193
Fedora Linux Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Chris Hurst:
Make sure your threads are pooled, Java 5 can help here.

If you don't want to maintain a list of threads you could make each call back to a central thread controller on completion which you register with the worker threads on creation (I'm not saying that maintaining thread list is bad just there are other options if you need them) obviously in this case you would need a count or something to determine they are all finished. I would then use the on completion to return the thread to pool/start a new worker allocated from your collection.

I'd have a good look at the new Java 5/6 stuff like CyclicBarrier etc to see if they can help first.

Currently, I am not sure if the thread pool is the right solution for me. Because, I hope that each entities in the Collection performs concurrently regardless of the amount of the entities in the Collection. i.e. 1,000, 1,000,000, 1,000,000,000.... Once it hits the performance bottle neck, I can decide whether or not to setup a validation rule for the maximum entities allowed.

Unless if somebody tells me there is a huge performance issue to create new threads for each entities in Collection, (including create a second Collection to maintain the list of threads) or there is a huge improvement to use the thread pool.

Thanks.
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24217
38
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, you can't create a billion threads, or even a million threads, or even ten thousand threads (on most platforms, anyway.) Windows limits you to about 4000, I think, and Linux somewhere in that ballpark. If there are going to be more items, then a pool is definitely needed.
 
Jiafan Zhou
Ranch Hand
Posts: 193
Fedora Linux Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Windows only limits to around 4000 threads allowed concurrently???
- If it is true, then I have to use the thread pool with no other choice I think.
(Because initially I assume that I want to execute more than at least 100,000 entities concurrently)

I have implemented some dummy code as follows: (not sure if they will work to meet my requirement)


The code looks quite straight forward and hope I haven't made any major mistakes. There are some questions in my mind in respect to this code snippet.

- CachedThreadPool is defined to create one thread per task. A CachedThreadPool will generally creates as many threads as it needs during the execution of a program and then will stop creating new threads as it recycles the old ones.
As we discussed above, what will happen if the maximum thread limitation on Windows/Linux is reached? (i.e. around 4000 in windows?) (I guess the cached thread pool will block and wait for existing threads in the pool to finish, is that correct?)

- How about the "FixedThreadPool"?? (in comparison to CachedThreadPool)

- Knowing that I need to wait all threads in the Collection to finish, is the code provided correct to do that?
- if it is correct, I designed to wait 10 seconds until all threads completion, maybe it is too long?

(By the way, I studies Java thread 1.4, the concurrent Java API to deal with threads seems completely changed since 1.5. Is that only me that feel it is strange?)

Thanks
 
Jiafan Zhou
Ranch Hand
Posts: 193
Fedora Linux Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Another question, maybe I should create another thread for this question, if the moderator feels the same. Anyway...

What is the maximum amount of processes (n.b. not thread) that Operating System can normally handle concurrently? (I understand that Ernest mentioned JVM can handle around 4,000 threads concurrently, but what about processes?)

The reason I want to ask this question is that I want to iterate through the Collection I mentioned before, create new threads for each entities and using Runtime.getRuntime().exec() method to create a separate process.

And while reading the Javadoc of Process, something really worries me... i.e.
"The methods that create processes may not work well for special processes on certain native platforms, such as native windowing processes, daemon processes, Win16/DOS processes on Microsoft Windows, or shell scripts. The created subprocess does not have its own terminal or console. All its standard io (i.e. stdin, stdout, stderr) operations will be redirected to the parent process through three streams (getOutputStream(), getInputStream(), getErrorStream())."

[ July 31, 2008: Message edited by: Jiafan Zhou ]
[ July 31, 2008: Message edited by: Jiafan Zhou ]
 
Carey Evans
Ranch Hand
Posts: 225
Debian Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As far as I can tell, you'll get some kind of Error, maybe OutOfMemoryError, when the OS thread creation fails. For this reason, a fixed thread pool with a reasonable limit would be a very good idea.

It doesn't matter much what timeout you specify, as awaitTermination() will return as soon as the thread pool has finished. You could make it Integer.MAX_VALUE if you wanted to.

The Java 1.4 APIs are all still there, but the new Java 5 APIs make it really easy to correctly implement a custom thread pool.

For your question about processes: my Linux installation here is configured for a maximum of 32767 processes. After that, Runtime.exec() and ProcessBuilder will get an IOException when you try to start another one. However, I think you'll have problems long before you start that many subprocesses at once.
 
Pat Farrell
Rancher
Posts: 4686
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I can't think of a reasonable problem that needs more than 100 threads. If you think you need hundreds of thousands, I think you need to explore the problem space again. Parallel programming is hard. Getting twice the throughput with two threads over one thread is nearly impossible. If you are lucky, you will get an 80% improvement for adding each thread, you quickly hit diminishing returns.
 
Jiafan Zhou
Ranch Hand
Posts: 193
Fedora Linux Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Pat Farrell:
I can't think of a reasonable problem that needs more than 100 threads. If you think you need hundreds of thousands, I think you need to explore the problem space again. Parallel programming is hard. Getting twice the throughput with two threads over one thread is nearly impossible. If you are lucky, you will get an 80% improvement for adding each thread, you quickly hit diminishing returns.


I have created my domain problem in the following link:
domain problem
However, I still think I need more threads/process to deal with the problem.
 
Carey Evans
Ranch Hand
Posts: 225
Debian Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you create far too many threads, you will spend most of your time switching between threads, and very little time getting any work done. This is even more true if you have thousands of processes on each end of thousands of network connections, especially given that opening and closing a TCP connection is relatively slow, as is setting up the encryption each time.
[ August 02, 2008: Message edited by: Carey Evans ]
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!