File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes I/O and Streams and the fly likes scraping XML Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "scraping XML" Watch "scraping XML" New topic
Author

scraping XML

Ciri Bhoy
Greenhorn

Joined: Oct 20, 2011
Posts: 16
Hi all,

I'm writing a small app that reads in an XML file from a website as an inputstream, but I want to parse this inputstream in order to display only certain results contained as follows:.

<tr>
<td class="first">

<img id="ctl00_Content_ctl00_rptInfo_ctl16_Image2" alt="Inactive" src="../../images/t2.jpg" style="border-width:0px;" />
</td>
<td >
Brussels
</td>
<td>
Aer Lingus
</td>
<td>
EI639
</td>
<td>
12 Mar 21:50
</td>
<td class="last">
Arrived 21:39
</td>
</tr>

<tr>
<td class="first">
<img id="ctl00_Content_ctl00_rptInfo_ctl17_Image1" alt="Active" src="../../images/t1.jpg" style="border-width:0px;" />

</td>
<td >
..........................
..........................

I'm currently doing this by reading in each line and pulling out the relevant data using readLine() and it's working fine.....problem is, this seems far too easy. It's only a small project so performance isn't really an issue, I'm again just looking for the 'right' way of doing it....or a few 'right' ways. I hope I'm making myself clear enough, I'm afraid I'm not too well up on the jargon yet.

Any advice is very welcome and appreciated.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41134
    
  45
That doesn't look like XML; it looks like HTML. My first weapon of choice would be a library that can handle HTML like HtmlUnit, which also handles the downloading of the page.


Ping & DNS - my free Android networking tools app
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: scraping XML
 
Similar Threads
How to hide a cell w/ background image until a function call
location.href not working on FireFox 3.5
JSP page not getting displayed
Page displaying in IE6 and older versions but not in Higher versions, chrome and firefox also.
Getting search results on same search page