• Post Reply Bookmark Topic Watch Topic
  • New Topic

Readign Web pages from Java.  RSS feed

 
Bala Gangadhar
Ranch Hand
Posts: 119
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I am using Jsoup to read some content on the web pages.
I just want to know does the web site owner tracks that the pages are read by an automated program..?
What does the client (Browswer version etc..) are captured for the JSoup http requests ? Can we configure it to some browser settings, so that the http request is simulated as it is from some web broswer ?

Thanks in Advance...
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You will never know what, exactly, the web site tracks. Whether they look kindly on automated requests depends on the site. But in any case be sure not to overwhelm it in order to avoid the appearance of DOS attack, and, of course, make sure that whatever you do with the scraped content is in keeping with their copyright. Whatever client you use, you may want to set the User-Agent header to whatever browser you're trying to impersonate: http://stackoverflow.com/questions/6581655/jsoup-useragent-how-to-set-it-right
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!