• Post Reply Bookmark Topic Watch Topic
  • New Topic

Extracting hyperlinks in a HTML page  RSS feed

 
Sree Jag
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

Given an URL, i need to download that HTML page and extract all the hyperlinks ( <a href> ) tags using java.

Can anyone point out a tool or suggest how to do it?

Thanks,
Seshu
 
Layne Lund
Ranch Hand
Posts: 3061
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look at the java.text.html package. I haven't used it, but there are probably some useful tools there for this particular job.

Layne
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I used the Quiotix HTML Parser for this kind of thing with great satisfaction. There are others around, and there may be something just as good in the JDK.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!