• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Ron McLeod
  • Junilu Lacar
  • Liutauras Vilda
Sheriffs:
  • Paul Clapham
  • Jeanne Boyarsky
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Piet Souris
  • Carey Brown
Bartenders:
  • Jesse Duncan
  • Frits Walraven
  • Mikalai Zaikin

HTML Parser

 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
i wanna write a HTML Parser using Swing.text api to get hyperlinks and their value
eg. for the hyperlink <a href=http://www.google.com>Google</a>
i want two string "http://www.google.com" and "Google"
TIA
avi tiw
 
author and iconoclast
Posts: 24204
44
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Moving to Java in General (Intermediate).
 
Ranch Hand
Posts: 688
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can try this open source project at HTML Parser
 
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Look at javax.swing.text.html.HTMLDocument. It has a getIterator() method that takes an HTML.Tag object as it's argument, returning an Iterator of all tags of a particular kind in a document.
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I like the Quiotix Parser. It builds an object model in memory and provides a Visitor Pattern interface to help you find things or modify them.
 
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I reccomend Sun's very own parser! It is defined so that malformed html's are 'fixed' by adding missing tags. it's architecture is event driven and gives good perforance.

You should (could) implement the above call-back.
Default implementation exists, but I think you could (should) improve upon it.
I have used it extensively and it is quite good. I uesd for parsing of hundreds of thousands of documents (!) with no memory leeks.
If you have any more questions I would be happy to help.
Azriel
 
pie. tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic