• Post Reply Bookmark Topic Watch Topic
  • New Topic

parse html contents  RSS feed

 
Edward Chen
Ranch Hand
Posts: 798
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have some html contents , inside which has some links. Now I want to parse these contents to extract that link address.

How to to do this ? I have browsed JTidy and XQuery, I don't find much valuable info.


Thanks.
 
Bear Bibeault
Author and ninkuma
Marshal
Posts: 65833
134
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Did you try googling for "HTML parser"?

Or, if the patterns you are searching for are fairly distinct, you might just try a regular expression search rather than a full-blown parse.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is the HTML content well formed as per the XHTML standard or is it typical sloppy HTML that most browsers allow?
Bill
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!