Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Is my implementation of threads correct for this application?  RSS feed

 
Benjamin Scabbia
Ranch Hand
Posts: 34
Eclipse IDE Python Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Good evening! I have this program which I'm currently working on that desperately needs some thread support.

The program looks in the specified URL and scans the page for links and email addresses. It also grabs all links from the homepage (theoretically all other pages of website) and then again looks for links.
(Note, not full code below + using an HTML parser called JSoup)


The program works perfectly but it's fairly slow. Can someone explain to me how I implement threads for this particular program. I assume (depending on the number of processors) I can have x threads running, therefore I can iterate through x number of sites?

This is what I have come up with so far - although I have never used threads so no idea if my implementation is correct.



Then in the main application I could do something like:


Is this implementation correct? I understand that the code is not complete but I'm just wondering if what I'm doing is correct? Theoretically, I could then use a large array of websites, and let the processors iterate through them and as one finishes.... what happends then? How would I make the thread that's finished parsing through the website start on the next element in array?

Thanks guys
 
Paul Clapham
Sheriff
Posts: 22489
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You'd only need to concern yourself with the number of processors if your tasks were actually using processors. But accessing a web site is far from being compute-bound. To the contrary, it spends almost all of its time waiting for network traffic to complete. So the number of processors isn't relevant to the number of threads you should use.

Anyway, yes, the ExecutorService is the way to go. Your Processor class would find more links, so it would have to create new Processor objects for them and give them to your ExecutorService. You should experiment to see if the number of threads in the pool makes a difference; it may be that your operating system's TCP/IP stack throttles the number of threads it handles at once, for example.

 
Benjamin Scabbia
Ranch Hand
Posts: 34
Eclipse IDE Python Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:You'd only need to concern yourself with the number of processors if your tasks were actually using processors. But accessing a web site is far from being compute-bound. To the contrary, it spends almost all of its time waiting for network traffic to complete. So the number of processors isn't relevant to the number of threads you should use.

Anyway, yes, the ExecutorService is the way to go. Your Processor class would find more links, so it would have to create new Processor objects for them and give them to your ExecutorService. You should experiment to see if the number of threads in the pool makes a difference; it may be that your operating system's TCP/IP stack throttles the number of threads it handles at once, for example.



Great, I'm glad that i'm at least on the right path. And yes that make sense that the app is not computer-bound, although tests suggest a thread pool is a necessity!

So if I had an array of websites, how do I implement this into the program? I get I need to create a new Processor for each site, but the only obvious way to do it is like this:


Would above work or would I end up creating 50 different processors? Or because I specified the executor thread pool to 4, would that mean once one has finished it would then start on the next threadPool - therefore even though I have 50 processors, only 4 will be active at once?

Thanks for your reply, I really appreciated it!
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!