• Post Reply Bookmark Topic Watch Topic
  • New Topic

Creating a Google like bot...  RSS feed

 
Chris Stewart
Ranch Hand
Posts: 184
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was interested in building a bot such as those found at Google. I've got an object model created and a good idea of how I want everything to work (flow control, logging, bot object instances, etc.), but I'm not sure how I determine where the bot should actually go. Does anyone have an idea of how I can come up with a list of URLs for the bot to travel to? I'm looking for something as dynamic as possible.
 
Cindy Glass
"The Hood"
Sheriff
Posts: 8521
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you mean like a spider? I think that they go to a page and then find all the urls referenced on that page and go to those pages and then etc. etc.
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's a very nice book on the topic:
Programming Spiders, Bots, and Aggregators in Java by Jeff Heaton ISBN:0782140408 Sybex � 2002 (516 pages)
I'd recommend his chapters on being legal and polite. Following links and downloading every page of a web site may not make you a welcome visitor.
 
Chris Stewart
Ranch Hand
Posts: 184
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I wouldn't go nearly as far as Google does. All I want to do is hit each webpage and log the URL, IP, and title from the html tags. Then move to the next.
I figured out my originial problem, I can just go through IPs one at a time.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!