Win a copy of Head First Agile this week in the Agile forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Trouble parsing XML document received via http (not SOAP)  RSS feed

Luke Porter
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi there,

I'm having some issues parsing an xml document that is streamed from an http source.

My program sends an XML request via http (to an ASP Page) like so (conn is the HttpURLConnection):

conn.setRequestProperty("Content-type", "application/x-www-form-urlencoded");

OutputStreamWriter outStream = new OutputStreamWriter(new BufferedOutputStream(conn.getOutputStream()),"UTF-8");

I've left out some of the finer details, but the request works fine.

When a CDATA node is returned with a large amount of formatted binary data in the response XML document, the text is always squashed onto 2 lines, essentially losing the layout of the CDATA contents (in this case, a text report).

eg. Correct formatting:


company 45393987398
figures 983983983983
blah blah 1023848484

eg. Actual formatting:

<CDATA[[report company 45393987398 figures 983983983983blah blah 1023848484 hello]]>

I have verified that the XML being sent to me is ok as I have pulled it using the ServerXMLHTTP object via asp - the report comes back fine, all formatting correct.

Here are the two ways I have tried to retreive the XML response in java:


InputStream is = conn.getInputStream();
InputStreamReader isr = new InputStreamReader(is,"UTF-8");
BufferedReader in = new BufferedReader(isr);
StringBuffer contents = new StringBuffer();

boolean finished = false;
String aLine = in.readLine();
if(aLine == null)
finished = true;

2) Using a SAX parser:

XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");

org.xml.sax.ContentHandler handler = new MySAXHandler();

InputStream in = conn.getInputStream();
InputSource source = new InputSource(in);

In both cases, the entire XML document is received over http, but the formatting in the CDATA node is lost. I have done a test whereby I save the correctly formatted XML document to a file on the disk and try to parse from the local file instead of a stream, and IT WORKS FINE! Seems to be an issue with how the stream is pulling in the XML document.

I am really out of ideas and wondered if anybody had some suggestions on what I can try?

Many thanks for your time.
wise owen
Ranch Hand
Posts: 2023
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Peer Reynders
Posts: 2968
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Avoid using either "\n" or "\r\n" (they're platform dependent); use the line.separator property instead i.e.

Anyway, to elaborate on the previous post see
Returns: A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached

[ February 03, 2006: Message edited by: Peer Reynders ]
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!