• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

HTML Parser

 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
i wanna write a HTML Parser using Swing.text api to get hyperlinks and their value
eg. for the hyperlink <a href=http://www.google.com>Google</a>
i want two string "http://www.google.com" and "Google"
TIA
avi tiw
 
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Moving to Java in General (Intermediate).
 
Ranch Hand
Posts: 688
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can try this open source project at HTML Parser
 
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Look at javax.swing.text.html.HTMLDocument. It has a getIterator() method that takes an HTML.Tag object as it's argument, returning an Iterator of all tags of a particular kind in a document.
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I like the Quiotix Parser. It builds an object model in memory and provides a Visitor Pattern interface to help you find things or modify them.
 
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I reccomend Sun's very own parser! It is defined so that malformed html's are 'fixed' by adding missing tags. it's architecture is event driven and gives good perforance.

You should (could) implement the above call-back.
Default implementation exists, but I think you could (should) improve upon it.
I have used it extensively and it is quite good. I uesd for parsing of hundreds of thousands of documents (!) with no memory leeks.
If you have any more questions I would be happy to help.
Azriel
 
moose poop looks like football shaped elk poop. About the size of this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic