Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

XML parsing in Java

 
raghav srinivasan
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am looking to parse an XML stream which contains namespace and schema definitions. Below is the same XML code I am looking to parse.

<ns1:Sample xmlns:ns1="bp:Profile" soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<in0 href="#id0"></in0>
<in1 href="#id1"></in1>
<in2 xsi:type="xsd:string">1234567890</in2>
<in3 href="#id2"></in3>
</ns1:Sample>

I should be able to extract the string "1234567890" from the above code. It would be great if someone can guide me how to achieve the same.

Thanks,
Raghav.
 
Jelle Klap
Bartender
Posts: 1952
7
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
First thing, you'll need to decide what kind of parsing strategy would be appropriate, and choose a parser accordingly.
Generally speaking your options are tree based parsing (Document Object Model or DOM) versus event based parsing (SAX).

A DOM parser reads the entire XML content and accordingly builds the full hierarchical object graph in memory, which is great if you need to traverse the object graph frequently or manipulate it, but it's not a great choice for large bodies of XML. Reason being that the object graph would consume enourmous amounts of memory.

Conversely, a SAX parser reads XML content in chunks and pushes those chunks to the application using an event model, which is useful for dealing with huge amounts of XML data, but offers very little control to the client when comared to the DOM alternative.

Another alternative to both DOM and SAX would be StaX, which sort of bridges the gap between DOM and SAX.

If the example XML snippet is an accurate representation of the size of the XML content you'll be processing, a DOM parser shouldn't be a problem memory wise, unless you'll be processing massive amounts concurrently, and it's by far the easiest approach to get started with.

You could get started with JAXP (supports both DOM and SAX), which is part of the core Java library, or you could look at a popular 3rd party library like JDOM or DOM4J.
 
raghav srinivasan
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jelle,

Thanks for your reply.. I guess I will go with the DOM parser since my XML file is small . In addition to the above example code,there would be soap Multireferences inclusions. Kindly let me know whether DOM would be able to parse the same.
I have one more query, I am searching for a tool which would de-serialize my Multireference contained in soap messages into simple XML tags. It would be great if you can share your views on this too.

Thanks again,
Raghav.
 
Jelle Klap
Bartender
Posts: 1952
7
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
raghav srinivasan wrote:Hi Jelle,

Thanks for your reply.. I guess I will go with the DOM parser since my XML file is small . In addition to the above example code,there would be soap Multireferences inclusions. Kindly let me know whether DOM would be able to parse the same.
I have one more query, I am searching for a tool which would de-serialize my Multireference contained in soap messages into simple XML tags. It would be great if you can share your views on this too.

Thanks again,
Raghav.


Oh, you need to process SOAP requests? Guess I skimmed over the namespace too quickly, but the XML example doesn't appear to be a structured as a valid SOAP message.
Still not quite sure what the use case here is exactly, but it looks like you'd be better of adopting a specialized SOAP library like Apache Axis?

 
raghav srinivasan
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jelle,

Thanks for your reply..

Yes it is a soap message,sorry to have just put only a part of it..It was just the body. As you have suggested,I am presently using Axis2 for processing SOAP messages. I also had an requirement for processing XML tags,which went on well with the DOM parser. For the SOAP messages,I was just looking on some tool which would deserialize the Multiref in the messages to simple XML tags. As you have suggested,yes,Axis2 has soap libraries to process the request but my requirement is for study purpose and my idea was to understand SOAP message better but held up with the multireference I had an opportunity to learn from the tutorials but it takes time for me to deserialize every message and it becomes a tough task when the references are more and when the soap message is pretty big. It would be great if any tool would do it in seconds.

Kindly share your ideas.

Thanks,
Raghav.
 
salvin francis
Bartender
Posts: 1306
10
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jelle Klap wrote:First thing, you'll need to decide what kind of parsing strategy would be appropriate, ...


I know one strategy @ SAX, using Stack for processing intermediate objects as they are read from the xml content,
when system finds start element, it pushes a bean into the stack...
when system finds end element, it pops an element from the stack.
I have left out many details here,

just curious to know, are there any other similar strategies /patterns ?
 
raghav srinivasan
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

Kindly share your ideas if any.. I had given up trying to find tools which would do that


Many thanks,
Raghav.
 
salvin francis
Bartender
Posts: 1306
10
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hmm I don't know of any tools,
but I dont think its that trivial to implement it on your own,
post your code here and we can help you if you are stuck any where.
 
raghav srinivasan
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Salvin,

Thanks for your reply. Below is my piece of SOAP code which I would like to de-serialize.

<soapenv:Body>
<ns1:Profile xmlns:ns1="BP:ProfileMS" soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<in0 href="#id0"></in0>
<in1 href="#id1"></in1>
<in2 xsi:type="xsd:string">1234567890</in2>
<in3 href="#id2"></in3>
</ns1:Profile>
<multiRef xmlns:ns2="http://test.myTest.services.BP.com" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" id="id0" soapenc:root="0" soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xsi:type="ns2:SSE">UMB</multiRef>
<multiRef xmlns:ns3="http://test.myTest.services.BP.com" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" id="id2" soapenc:root="0" soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xsi:type="ns3:STE">NUMBER</multiRef>
<multiRef xmlns:ns4="http://test.myTest.services.BP.com" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" id="id1" soapenc:root="0" soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xsi:type="ns4:ACE">UM</multiRef>
</soapenv:Body>

Many Thanks,
Raghav.
 
salvin francis
Bartender
Posts: 1306
10
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hey thats great,

so, what have you written in java code to parse this ?
post the code and maybe we could help you with details...
 
raghav srinivasan
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Salvin,

Just started off with the code..Will post it soon

Thanks,
Raghav.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic