Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Web Crawler - why it needs thread ??

 
marlajee Borstone
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello friends, how do you do ?

Now-a-days, I am working on web crawler . However, I am too weak in Threading

Here is the code:



I understood all the spects of this code, except the use of Thread.
Why have we used here Thread ?? I can see the main task, which a web crawler need to do, is written here inside the run() method.
is this because this run method will handle multiple threads concurrently ??

If so, can somebody help to know how can I write a code(a separate java file) which would have multiple thread to run on this run() method of Webcrawler2 and utilize the use of thread for this class ??

I would appreciate higly for this help....common friends.. help me out...

- Dhansumaal
[ September 24, 2008: Message edited by: marlajee Borstone ]
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24212
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please don't do this. Seriously.
 
marlajee Borstone
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry Ernest, I could not get you.....

-Dhansumaal
[ September 23, 2008: Message edited by: marlajee Borstone ]
 
Henry Wong
author
Marshal
Pie
Posts: 21504
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why have we used here Thread ?? I can see the main task, which a web crawler need to do, is written here inside the run() method.
is this because this run method will handle multiple threads concurrently ??


Actually, no. This run() method will be called from a thread, that enables it to run concurrently with the thread that started it. It's the Thread class will deal with the the threads -- not the run() method. The run() method is the code that the thread will run.

As for how this is done, you need to examine the code that instantiates and starts the class whose code you have shown.

Henry
 
marlajee Borstone
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Henry, thank you verymuch for your quick and helpful response. But it seems I am still away from the exact reply which I am lookign for...... sorry to bother you again..but I really need help on it....

This run() method will be called from a thread, that enables it to run concurrently with the thread that started it. It's the Thread class will deal with the the threads -- not the run() method. The run() method is the code that the thread will run.


Yes, this I can understand from my above code also. There 'searchThread' which is responsible to enable the run() method to execute.

But, as my above code is just an utility as it does not have any Main method. So when I write a class like:

I can see it starts Wcrawler2 and crawls all the URLS starting from my Tomcat server's index page.
BUT, it is just a single processing without use of thread. HOWEVER, I want to utilize the thread which is used in Wcrawler2. So that when two concurrent serch request comes with two different URLs, it could handle concurrently.
For that please suggest me how can I modify my above Test class ??
looking forward for you suggestion.......

~Dhansumal
 
Henry Wong
author
Marshal
Pie
Posts: 21504
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I can see it starts Wcrawler2 and crawls all the URLS starting from my Tomcat server's index page.
BUT, it is just a single processing without use of thread. HOWEVER, I want to utilize the thread which is used in Wcrawler2. So that when two concurrent serch request comes with two different URLs, it could handle concurrently.
For that please suggest me how can I modify my above Test class ??


First of all, I really think you should get started on learning threads. Threads is not something you can just add to your application, without a clear understanding of how it works. You can start with the Sun Threads Tutorial....

http://java.sun.com/docs/books/tutorial/essential/concurrency/index.html

Now to answer your question... If you look at your test class (by putting a print statement after the startSearch() method call), you will see that the search is actually done concurrently, as the main() method returns from the search call and exits.

This means that the main thread (running the main method) can create another instance, and start another search. It is just that your test code stops after the first one.

Henry
 
Henry Wong
author
Marshal
Pie
Posts: 21504
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by marlajee Borstone:
Sorry Ernest, I could not get you.....

-Dhansumaal


I think what EFH is trying to say -- and I totally agree, is webcrawling causes huge problems for websites, and should be left to the professionals.

Even google can overwhelm a site, and basically become similar to doing a DOS attack, when it is webcrawling.

If you are not careful, you can find yourself being blocked from the sites that you are trying to scrape data from.

Henry
 
marlajee Borstone
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Henry for this clarification as well as for previous reply.
it is really very helpful.

~Dhansumaal
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic