• Post Reply Bookmark Topic Watch Topic
  • New Topic

Count external hyperlinks on each page of the the web-site  RSS feed

 
Kirill Varivoda
Greenhorn
Posts: 20
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
At first I'm new here I'm not sure if it's beginning Java, I didn't know where to post.

MY GOAL: I need to count external hyperlinks to other web-sites on each page of the the web-site.

I've found this code here on stackoverflow:



Those who understand in technologies how can I use this code? I don't know how to use it. It consists methods that do not include in standard Java and I didn't find solution on internet.

Also I've found other working code that uses standard Java but it counts links only in one web-page. Can't create pattern to look all pages on web-site yet.

Any ideas?
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Kirill Varivoda wrote:MY GOAL: I need to count external hyperlinks to other web-sites on each page of the the web-site.
...
Also I've found other working code that uses standard Java but it counts links only in one web-page. Can't create pattern to look all pages on web-site yet.

Well, from your question, wouldn't that just be the sum of all counts for individual pages?

You also haven't said whether you want a counts of external links, or all links. They will be slightly different, and it might get a bit involved - especially with IPv6 addresses - unless you assume that a link referred to by IP address (as opposed to domain name) is "internal".

However, I suspect that your first task - if, indeed, you need an "all pages" solution - will be to create a WebSiteTree class.

And that alone is going to be fun.

Winston
 
Kirill Varivoda
Greenhorn
Posts: 20
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
Kirill Varivoda wrote:MY GOAL: I need to count external hyperlinks to other web-sites on each page of the the web-site.
...
Also I've found other working code that uses standard Java but it counts links only in one web-page. Can't create pattern to look all pages on web-site yet.

Well, from your question, wouldn't that just be the sum of all counts for individual pages?

You also haven't said whether you want a counts of external links, or all links. They will be slightly different, and it might get a bit involved - especially with IPv6 addresses - unless you assume that a link referred to by IP address (as opposed to domain name) is "internal".

However, I suspect that your first task - if, indeed, you need an "all pages" solution - will be to create a WebSiteTree class.

And that alone is going to be fun.

Winston


I need to count only links to other domains on each page.

I didn't mention but in addition I need to know number of clicks from main page when it get to any inner page.
So when program finished, it should be like:
/main_page.html - number of links to other domains, 0 clicks
main_page/first_inner_page.html - number of links to other domains, 1 click
/first_inner_page/the_inner_page.html - number of links to other domains, 2 clicks
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Let's take a step back: What is the purpose of this?
 
Tim Cooke
Marshal
Posts: 4041
239
Clojure IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Kirill, and welcome to the Ranch!

I have a question. Does the website for which you want to find out this information belong to you? If so, then I am quite sure that the likes of Google Analytics will collate this kind of information for you.
 
Kirill Varivoda
Greenhorn
Posts: 20
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ulf Dittmer wrote:Let's take a step back: What is the purpose of this?


It's an exercise that local company send me for practice. It's for Java courses in company. Kind of hard for beginners, isn't it?
 
Kirill Varivoda
Greenhorn
Posts: 20
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim Cooke wrote:Hello Kirill, and welcome to the Ranch!

I have a question. Does the website for which you want to find out this information belong to you? If so, then I am quite sure that the likes of Google Analytics will collate this kind of information for you.


I know about it but I need to create a simple application to built to check any web-site
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are any number of ways of doing this. If you're not supposed to use 3rd party libraries, then the code you found is not a bad start. Now you need to keep track of all linked pages -the code already extracts the URLs, so that's a good start- in two sets of strings "procesed URLs" und "unprocessed URLs", so that you don't process the same URL repeatedly. You'll also need to look at the domain, so that you don't start going off-site.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!