This week's book giveaway is in the Cloud/Virtualization forum.
We're giving away four copies of Building Blockchain Apps and have Michael Yuan on-line!
See this thread for details.
Win a copy of Building Blockchain Apps this week in the Cloud/Virtualization forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Liutauras Vilda
  • Knute Snortum
  • Bear Bibeault
Sheriffs:
  • Devaka Cooray
  • Jeanne Boyarsky
  • Junilu Lacar
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
  • salvin francis
Bartenders:
  • Tim Holloway
  • Piet Souris
  • Frits Walraven

Count external hyperlinks on each page of the the web-site

 
Greenhorn
Posts: 20
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
At first I'm new here I'm not sure if it's beginning Java, I didn't know where to post.

MY GOAL: I need to count external hyperlinks to other web-sites on each page of the the web-site.

I've found this code here on stackoverflow:



Those who understand in technologies how can I use this code? I don't know how to use it. It consists methods that do not include in standard Java and I didn't find solution on internet.

Also I've found other working code that uses standard Java but it counts links only in one web-page. Can't create pattern to look all pages on web-site yet.

Any ideas?
 
Bartender
Posts: 10777
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Kirill Varivoda wrote:MY GOAL: I need to count external hyperlinks to other web-sites on each page of the the web-site.
...
Also I've found other working code that uses standard Java but it counts links only in one web-page. Can't create pattern to look all pages on web-site yet.


Well, from your question, wouldn't that just be the sum of all counts for individual pages?

You also haven't said whether you want a counts of external links, or all links. They will be slightly different, and it might get a bit involved - especially with IPv6 addresses - unless you assume that a link referred to by IP address (as opposed to domain name) is "internal".

However, I suspect that your first task - if, indeed, you need an "all pages" solution - will be to create a WebSiteTree class.

And that alone is going to be fun.

Winston
 
Kirill Varivoda
Greenhorn
Posts: 20
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Winston Gutkowski wrote:

Kirill Varivoda wrote:MY GOAL: I need to count external hyperlinks to other web-sites on each page of the the web-site.
...
Also I've found other working code that uses standard Java but it counts links only in one web-page. Can't create pattern to look all pages on web-site yet.


Well, from your question, wouldn't that just be the sum of all counts for individual pages?

You also haven't said whether you want a counts of external links, or all links. They will be slightly different, and it might get a bit involved - especially with IPv6 addresses - unless you assume that a link referred to by IP address (as opposed to domain name) is "internal".

However, I suspect that your first task - if, indeed, you need an "all pages" solution - will be to create a WebSiteTree class.

And that alone is going to be fun.

Winston



I need to count only links to other domains on each page.

I didn't mention but in addition I need to know number of clicks from main page when it get to any inner page.
So when program finished, it should be like:
/main_page.html - number of links to other domains, 0 clicks
main_page/first_inner_page.html - number of links to other domains, 1 click
/first_inner_page/the_inner_page.html - number of links to other domains, 2 clicks
 
Rancher
Posts: 43011
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Let's take a step back: What is the purpose of this?
 
Sheriff
Posts: 4779
310
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Kirill, and welcome to the Ranch!

I have a question. Does the website for which you want to find out this information belong to you? If so, then I am quite sure that the likes of Google Analytics will collate this kind of information for you.
 
Kirill Varivoda
Greenhorn
Posts: 20
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ulf Dittmer wrote:Let's take a step back: What is the purpose of this?



It's an exercise that local company send me for practice. It's for Java courses in company. Kind of hard for beginners, isn't it?
 
Kirill Varivoda
Greenhorn
Posts: 20
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Cooke wrote:Hello Kirill, and welcome to the Ranch!

I have a question. Does the website for which you want to find out this information belong to you? If so, then I am quite sure that the likes of Google Analytics will collate this kind of information for you.



I know about it but I need to create a simple application to built to check any web-site
 
Ulf Dittmer
Rancher
Posts: 43011
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are any number of ways of doing this. If you're not supposed to use 3rd party libraries, then the code you found is not a bad start. Now you need to keep track of all linked pages -the code already extracts the URLs, so that's a good start- in two sets of strings "procesed URLs" und "unprocessed URLs", so that you don't process the same URL repeatedly. You'll also need to look at the domain, so that you don't start going off-site.
 
When all four tires fall off your canoe, how many tiny ads does it take to build a doghouse?
Java file APIs (DOC, XLS, PDF, and many more)
https://products.aspose.com/total/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!