File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes I/O and Streams and the fly likes scraping XML Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "scraping XML" Watch "scraping XML" New topic
Author

scraping XML

Ciri Bhoy
Greenhorn

Joined: Oct 20, 2011
Posts: 16
Hi all,

I'm writing a small app that reads in an XML file from a website as an inputstream, but I want to parse this inputstream in order to display only certain results contained as follows:.

<tr>
<td class="first">

<img id="ctl00_Content_ctl00_rptInfo_ctl16_Image2" alt="Inactive" src="../../images/t2.jpg" style="border-width:0px;" />
</td>
<td >
Brussels
</td>
<td>
Aer Lingus
</td>
<td>
EI639
</td>
<td>
12 Mar 21:50
</td>
<td class="last">
Arrived 21:39
</td>
</tr>

<tr>
<td class="first">
<img id="ctl00_Content_ctl00_rptInfo_ctl17_Image1" alt="Active" src="../../images/t1.jpg" style="border-width:0px;" />

</td>
<td >
..........................
..........................

I'm currently doing this by reading in each line and pulling out the relevant data using readLine() and it's working fine.....problem is, this seems far too easy. It's only a small project so performance isn't really an issue, I'm again just looking for the 'right' way of doing it....or a few 'right' ways. I hope I'm making myself clear enough, I'm afraid I'm not too well up on the jargon yet.

Any advice is very welcome and appreciated.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41620
    
  55
That doesn't look like XML; it looks like HTML. My first weapon of choice would be a library that can handle HTML like HtmlUnit, which also handles the downloading of the page.


Ping & DNS - my free Android networking tools app
 
Consider Paul's rocket mass heater.
 
subject: scraping XML