• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

scraping XML

 
Ciri Bhoy
Greenhorn
Posts: 16
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

I'm writing a small app that reads in an XML file from a website as an inputstream, but I want to parse this inputstream in order to display only certain results contained as follows:.

<tr>
<td class="first">

<img id="ctl00_Content_ctl00_rptInfo_ctl16_Image2" alt="Inactive" src="../../images/t2.jpg" style="border-width:0px;" />
</td>
<td >
Brussels
</td>
<td>
Aer Lingus
</td>
<td>
EI639
</td>
<td>
12 Mar 21:50
</td>
<td class="last">
Arrived 21:39
</td>
</tr>

<tr>
<td class="first">
<img id="ctl00_Content_ctl00_rptInfo_ctl17_Image1" alt="Active" src="../../images/t1.jpg" style="border-width:0px;" />

</td>
<td >
..........................
..........................

I'm currently doing this by reading in each line and pulling out the relevant data using readLine() and it's working fine.....problem is, this seems far too easy. It's only a small project so performance isn't really an issue, I'm again just looking for the 'right' way of doing it....or a few 'right' ways. I hope I'm making myself clear enough, I'm afraid I'm not too well up on the jargon yet.

Any advice is very welcome and appreciated.
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That doesn't look like XML; it looks like HTML. My first weapon of choice would be a library that can handle HTML like HtmlUnit, which also handles the downloading of the page.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic