• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

How to grab data from <div> under <table> by using htmlparser library?

 
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello everyone.

I am tryig use htmlparser library to grab data (price, and item name) from a webpage which is very like the following one.

<table class="item">
<tr>
<td>
<div class="title">Desktop</div>
</td>
<td>
<div class="price">$1,200</div>
</td>
</tr>
</table>

My code is very complex.

I used two parsers, one for searching <div> tag which class equal to "title", the other one is for searching <div> tag which class equal to "price".
I do not know htmlparser library well. I just start to use it two weeks ago, and I find it is very hard to find any sample about it in google.
Does anyone have any better idea?
Appreciated any help.
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You could help us to help you: how about providing us with a link to the documentation? Or a link to the product's home page?

(You can post URL's with the "URL" button which you will see above the box you post in.)
 
Cameron ax
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you,
This is for a general question, not for one particular case.
I do not have a link for this.


I want to make a small program to grab data from a website like amazon, ebay, or any online shopping website.
I found there is always a <div> or <span> of product info is placed under <table> tag.
I knnow how to get all the <div> tag form one single page.
but I want to know if there is a way to get all <div>s from one particular <table> tag? and what is method for that in htmlparser library?
Thank you
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, if you don't actually have the htmlparser software, then this seems like kind of a pointless question.

I assume you don't have it, otherwise you would have a link to where you downloaded it from. Or did somebody give you a copy? Maybe you could ask them?
 
Cameron ax
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry,
I thought you are asking my target html page I want to parser.

so you are askig about htmlparser software?

yes, I do have a link for this,

http://htmlparser.sourceforge.net/

That is the one I am using.
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry, I don't see anything in their "Samples" page there. I suppose the "FilterBuilder" example might be sort of what you're looking for.

However your project is a rather dubious one anyway. All of the sites you mentioned, I'm pretty sure, have terms of use which forbid people from accessing the sites via computer programs. You should at least check out the terms of use on each site before you start trying to scrape its pages.
 
Cameron ax
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

The target website I listed is for a example. To explain what is my target HTML page will look like.

I just want to know is there method in HTMLparser library could recognize DIV tag which might under a TABLE tag. Does any one has any experience about grab data from a HTML page like this. That is all.

Thank you for quick respond and your help.
 
Paul Clapham
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm sure you could do what you asked with that HTMLParser project. I doubt that there's a specific method to do it, though. Although now that I look through the API documentation, there's a Parser class and it has a method named "extractAllNodesThatMatch". Possibly -- actually quite likely now that I look at the docs more -- you could use that.

But yeah, the project really doesn't have much in the way of useful examples. You're pretty much left to trawl through the docs and figure it out for yourself. Although looking at their examples wouldn't do you any harm.
 
Cameron ax
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I tried to use extractAllNodesThatMatch method. first, I use this method to get all <table> tag from webpage, it return nodelist, I call it tablelist.
after that, I tried to use this method to get <span> from tablelist.
But somehow, it does not work.

For now, I use regular expression method to get the content I want. but I am still looking for some simple method to parser html file.
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic