• Post Reply Bookmark Topic Watch Topic
  • New Topic

JAVA Site Scrapper  RSS feed

 
Shailesh Kulkarni
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Everybody,

I want to write scrap engine in java/struts. The main requirement is that i want to scrap site and navigate that site with my code.
e.g. Suppose i want to scrap www.rediffmail.com then if i provide proper user name and password then must be able to login to that site through my own code. IS anybody having any idea of scrapping and writing code to access another site data.
 
Julia Reynolds
Ranch Hand
Posts: 123
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Shailesh,

I wrote a web scraper using the Apache httpclient to handle the connections.
http://jakarta.apache.org/commons/httpclient/

I fetched the text of the web page with httpclient and then wrote an html parser for the data tables on the page. The parser uses regex patterns to recognize groups of html table/row/column tags.

I hope this helps.

Julia
 
Shailesh Kulkarni
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Julia,

Can u please elaborate this. I have not understood what u mean by this.

Thanks in advance.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!