Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

JSP Crawler  RSS feed

 
Shashank Agarwal
Ranch Hand
Posts: 105
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey everyone. I was trying to build a crawler, a search engine type crawler to crawl web pages and create reports from it. Well, this crawler will be a JSP. Now, the problem is that how do i make it follow a link. Lets say it href="http://www.javaranch.com" then its ok, and i can get the substring between the two double quotes. However, if the link is to an internal page, then most pages have it as href="page2.html" or "../page2.html". Here how do i make the crawler go to the page2.html

I hope I'm able to put across my problem.

Thanks in advance.
 
Pritam Barhate
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
See the artical Create intelligent Web spiders at JavaWorld.
 
Sonny Gill
Ranch Hand
Posts: 1211
IntelliJ IDE Mac
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And there is a chapter (or two) on it in The Art Of Java book.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is no real reason to make this a JSP since you are going to end up with huge amounts of computation and data. Why not work on the guts of the crawler as a stand-alone application until you get it working right.

Trying to do it in JSP code will just be confusing and hard to debug.
Bill
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!