This week's book giveaway is in the HTML Pages with CSS and JavaScript forum.
We're giving away four copies of Testing JavaScript Applications and have Lucas da Costa on-line!
See this thread for details.
Win a copy of Testing JavaScript Applications this week in the HTML Pages with CSS and JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

Extracting hyperlinks in a HTML page

 
Ranch Hand
Posts: 77
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

Given an URL, i need to download that HTML page and extract all the hyperlinks ( <a href> ) tags using java.

Can anyone point out a tool or suggest how to do it?

Thanks,
Seshu
 
Ranch Hand
Posts: 3061
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look at the java.text.html package. I haven't used it, but there are probably some useful tools there for this particular job.

Layne
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I used the Quiotix HTML Parser for this kind of thing with great satisfaction. There are others around, and there may be something just as good in the JDK.
 
    Bookmark Topic Watch Topic
  • New Topic