• Post Reply Bookmark Topic Watch Topic
  • New Topic

HTML Parser  RSS feed

 
avi tiw
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i wanna write a HTML Parser using Swing.text api to get hyperlinks and their value
eg. for the hyperlink <a href=http://www.google.com>Google</a>
i want two string "http://www.google.com" and "Google"
TIA
avi tiw
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24217
38
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Moving to Java in General (Intermediate).
 
Adrian Yan
Ranch Hand
Posts: 688
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can try this open source project at HTML Parser
 
Joe Ess
Bartender
Posts: 9441
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look at javax.swing.text.html.HTMLDocument. It has a getIterator() method that takes an HTML.Tag object as it's argument, returning an Iterator of all tags of a particular kind in a document.
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I like the Quiotix Parser. It builds an object model in memory and provides a Visitor Pattern interface to help you find things or modify them.
 
Azriel Abramovich
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I reccomend Sun's very own parser! It is defined so that malformed html's are 'fixed' by adding missing tags. it's architecture is event driven and gives good perforance.

you should (could) implement the above call-back.
Default implementation exists, but I think you could (should) improve upon it.
I have used it extensively and it is quite good. I uesd for parsing of hundreds of thousands of documents (!) with no memory leeks.
If you have any more questions I would be happy to help.
Azriel
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!