This week's book giveaway is in the Kotlin forum.
We're giving away four copies of Kotlin in Action and have Dmitry Jemerov & Svetlana Isakova on-line!
See this thread for details.
Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Method for get the content of the page HEAD - Crawler  RSS feed

 
Isaac Ferguson
Ranch Hand
Posts: 1063
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi

when I crawl a site I need check if in the HEAD, contains an specific symbol or code.

I have checked some methods like:

page.getFetchResponseHeaders());
page.getContentEncoding());
page.getContentType());
page.getParseData());

but they give me content not code....

Someone knows how could I get the code which is in the head of every web I visit?

Regards


 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have no idea what your "page" is, but if you make an HttpURLConnection (java.net package) there are methods for retrieval of all header fields.

Note that if you are only interested in header fields you can try a "HEAD" request - see setRequestMethod() - the default method for HttpURLConnection is GET.

Bill
 
Isaac Ferguson
Ranch Hand
Posts: 1063
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Im using crawler4j and Page it is type ot that library
 
Isaac Ferguson
Ranch Hand
Posts: 1063
3
Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What I need is check if inside the HEAD is installed the Google Analytics script or not

Then I need to search for a method in order to compare the content of that lines with for example the UA of the script.


HttpURLConnection httpURLConnection is not posible to instantiate, because

 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!