• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Liutauras Vilda
  • Campbell Ritchie
  • Tim Cooke
  • Bear Bibeault
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Knute Snortum
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Ganesh Patekar
  • Stephan van Hulst
  • Pete Letkeman
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Ron McLeod
  • Vijitha Kumara

Spiders  RSS feed

 
Ranch Hand
Posts: 46
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dear ,
I want to devlop an application that traverses from one web page to another and collects information based on some business rule. something like a spider used by search engines.
what are the most suitable technologies?
looking forward to a response
------------------
 
Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Java.
It's got very good, ahem, excellent, support for networking...quite easy too...or so I've heard.
David.
 
Divyajot Ahluwalia
Ranch Hand
Posts: 46
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks david,
but how do go about doing it. How do i create threads that traverse the web and can take their own decision which link to follow?
Divyajot

Originally posted by David Harrigan:
Java.
It's got very good, ahem, excellent, support for networking...quite easy too...or so I've heard.
David.


 
Ranch Hand
Posts: 243
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look at the book "Java Server Programming" by wrox press. there is a chapter dedicated to spider. But its a single threaded model, though not difficult to convert to multithreaded one.
check for 'web robot pages' on the net, for more information.
Also look at 'Anotomy of a large scale search Engine' by Larry Page. By the way, google spider is written in Python.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!