Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Reading web page from servlet

 
William EGreen
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do I get the HTML text of a given web page from a servlet? (i.e. I need to do some data mining. Also note that the web page in question could require a cookie. I have access to the cookie and can send it to the servlet.)
Thanks,
Bill Green
 
Jessica Sant
Sheriff
Posts: 4313
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
you could use a java program that access the website, make a request, and writes teh response to a file (thus saving the resulting HTML code).
You might be able to adapt the code from HttpUnit to do just that. It's mean to be a web site Unit testing suite, but you could use it to store the data in the page rather than validating it.
It's an open source project available here:
http://httpunit.sourceforge.net/
Hope that helps.
 
Bear Bibeault
Author and ninkuma
Marshal
Pie
Posts: 65220
95
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Check out URLConnection.
hth,
bear
 
Kripal Singh
Ranch Hand
Posts: 254
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try using following code
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic