• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

HTML Parser for Java

 
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I want to parse HTML files using java and access nodes using java. Can anyone of you suggest which is the best and easy to learn HTML parser for java I can use.

Thanks,
Mazhar
 
Ranch Hand
Posts: 1609
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Mazhar Ismail:
Hi All,

I want to parse HTML files using java and access nodes using java. Can anyone of you suggest which is the best and easy to learn HTML parser for java I can use.

Thanks,
Mazhar



can you please be more descriptive on where you are needing it?
 
Mazhar Ismail
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


Form the above html file i have to read the tags and grab the term "Sales" within <td> tag using java. I know there are some HTML parsers for java suing which we can read the HTML file. I wanted to know what are the parsers available and which is the best to use.

--
Mazhar
 
Akhilesh Trivedi
Ranch Hand
Posts: 1609
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would suggest you try xml. Rest is upto you.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The JTidy open source toolkit can parse ill-formed HTML (not parsable by standard XML parsers) and give you a DOM view.

Bill
 
Akhilesh Trivedi
Ranch Hand
Posts: 1609
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I was just concenred if Mazhar want to parse syntax of an html doc or if he was needing a solution to carry data in markups?
[ November 03, 2008: Message edited by: Akhilesh Trivedi ]
 
Akhilesh Trivedi
Ranch Hand
Posts: 1609
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Tried JTidy and Jericho here. For the following sample.




The address tag has no end tag. With JTidy, the output file was empty. May be it was parsing node-wise and did not end up with finshing-address tag.

While output from FormatSource.jsp of Jericho Jericho was



Not bad. I suspect if Jericho will have a node-base-parsing, for JTidy I do see a class "org.w3c.tidy.Node" in the docs, but seems documentation has not been updated since 2000.
[ November 05, 2008: Message edited by: Akhilesh Trivedi ]
 
Akhilesh Trivedi
Ranch Hand
Posts: 1609
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Akhilesh Trivedi:
....with JTidy, the output file was empty. May be it was parsing node-wise and did not end up with finshing-address tag.
....




Sorry just correcting myself. Tidy did output a file... here are the contents.



and here are the errors.

 
Willie Smits increased rainfall 25% in three years by planting trees. Tiny ad:
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic