• Post Reply Bookmark Topic Watch Topic
  • New Topic

Method for get the content of the page HEAD - Crawler

 
Isaac Ferguson
Ranch Hand
Posts: 1044
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

when I crawl a site I need check if in the HEAD, contains an specific symbol or code.

I have checked some methods like:

page.getFetchResponseHeaders());
page.getContentEncoding());
page.getContentType());
page.getParseData());

but they give me content not code....

Someone knows how could I get the code which is in the head of every web I visit?

Regards


 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have no idea what your "page" is, but if you make an HttpURLConnection (java.net package) there are methods for retrieval of all header fields.

Note that if you are only interested in header fields you can try a "HEAD" request - see setRequestMethod() - the default method for HttpURLConnection is GET.

Bill
 
Isaac Ferguson
Ranch Hand
Posts: 1044
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Im using crawler4j and Page it is type ot that library
 
Isaac Ferguson
Ranch Hand
Posts: 1044
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What I need is check if inside the HEAD is installed the Google Analytics script or not

Then I need to search for a method in order to compare the content of that lines with for example the UA of the script.


HttpURLConnection httpURLConnection is not posible to instantiate, because

 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!