Win a copy of Java 9 Modularity: Patterns and Practices for Developing Maintainable Applications this week in the Java 9 forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Spiders  RSS feed

 
Divyajot Ahluwalia
Ranch Hand
Posts: 46
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dear ,
I want to devlop an application that traverses from one web page to another and collects information based on some business rule. something like a spider used by search engines.
what are the most suitable technologies?
looking forward to a response
------------------
 
David Harrigan
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Java.
It's got very good, ahem, excellent, support for networking...quite easy too...or so I've heard.
David.
 
Divyajot Ahluwalia
Ranch Hand
Posts: 46
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks david,
but how do go about doing it. How do i create threads that traverse the web and can take their own decision which link to follow?
Divyajot
Originally posted by David Harrigan:
Java.
It's got very good, ahem, excellent, support for networking...quite easy too...or so I've heard.
David.

 
mohit joshi
Ranch Hand
Posts: 243
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look at the book "Java Server Programming" by wrox press. there is a chapter dedicated to spider. But its a single threaded model, though not difficult to convert to multithreaded one.
check for 'web robot pages' on the net, for more information.
Also look at 'Anotomy of a large scale search Engine' by Larry Page. By the way, google spider is written in Python.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!