• Post Reply Bookmark Topic Watch Topic
  • New Topic

Relative URLs  RSS feed

 
Ranch Hand
Posts: 833
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How my java code will get the all relative URLs e.g.
www.yahoo.com/aa
www.yahoo.com/bb
www.yahoo.com/cc
www.yahoo.com/dd
www.yahoo.com/ee
etc.

Thanks & best regards
 
Sheriff
Posts: 21135
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If all those URLs are located in an HTML page you can parse the page and look for all HREF and SRC attributes.
 
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What do you mean by "relative URLs"? URLs are always absolute; paths within a web site may be relative.

Can you give an example of an input and an output of what you're trying to do?
 
Farakh khan
Ranch Hand
Posts: 833
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ulf Dittmer:
What do you mean by "relative URLs"? URLs are always absolute; paths within a web site may be relative.

Can you give an example of an input and an output of what you're trying to do?


http://www.javaranch.com has many other related URLs e.g.
http://www.coderanch.com/forums/user/edit
http://www.coderanch.com/forums/user/login
http://faq.javaranch.com/Watch/
http://www.javaranch.com
etc.

How can my java code read the related URLs of http://www.javaranch.com

Thanks again & best regards
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So the input would be a web page, and the output would be a list of all URLs on that web page?
 
Farakh khan
Ranch Hand
Posts: 833
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ulf Dittmer:
So the input would be a web page, and the output would be a list of all URLs on that web page?


yes but how could I achieve this

Thanks again
 
Rob Spoor
Sheriff
Posts: 21135
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Like I said, parse the page and filter out the right attributes.



Of course SRC is not the only one. The following could also be used:
ACTION (forms)
BACKGROUND
CODEBASE
SRC (images, iframes, etc)

Plus possibly others.
 
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ulf Dittmer:
So the input would be a web page, and the output would be a list of all URLs on that web page?


Mhhh, my initial understanding was that the input would be a website address, and the output would be the URLs of all pages that belong to that site.

To which the answer would have been: not possible in general, not with Java or any other language. The HTTP-protocoll simply doesn't provide the necessary information.
 
Farakh khan
Ranch Hand
Posts: 833
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Rob Prime:
Like I said, parse the page and filter out the right attributes.



Of course SRC is not the only one. The following could also be used:
ACTION (forms)
BACKGROUND
CODEBASE
SRC (images, iframes, etc)

Plus possibly others.


great!

Thanks a lot. I am trying to understand

Thanks & best regards
 
Don't get me started about those stupid light bulbs.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!