• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Liutauras Vilda
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Saloon Keepers:
  • Scott Selikoff
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
  • Frits Walraven
Bartenders:
  • Stephan van Hulst
  • Carey Brown

Parsing data out of an XML document

 
Greenhorn
Posts: 4
Firefox Browser Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I want to parse a xml content some think like below. It is HTML formatted. How can i parse the content?
I want the values like name,employee number ,age etc....
But they are not defined in particular tag.
Kindly help me out in extracting the content from this HTML formatted xml content.


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ListForEmployee_1_0 SYSTEM "c:/file/hello.dtd">
<List suppressFolio="n" xmlProviderInfo="test Server" Strategy="normal">
<Wrapper>
<doc>
<docBody>
<displayGroup lineSeparator="n" leftIndent="10" fontFamily="Verdana" fontSize="11">
Service: <startStyle fontEmphasis="b"/>Employee File<endStyle/> <startStyle fontEmphasis="b"/>10 records<endStyle/>
<newLine n="1"/>
Company: <startStyle fontEmphasis="b"/>A2B company<endStyle/>
</displayGroup>
<displayGroup lineSeparator="y">
<table>
<cellWidth numSpaces="10"/>
<cellWidth numSpaces="2"/>
<cellWidth numSpaces="15"/>
<cellWidth numSpaces="20"/>
<cellWidth numSpaces="20"/>
<cellWidth numSpaces="13"/>
<tableBody>
<row>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Name<endStyle/></cell>
<cell topBorder="y" bottomBorder="y">Employee number</cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Sex<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Age<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Designation<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Date<endStyle/></cell>
</row>

<row>
<cell topBorder="y" bottomBorder="y" justification="left">Mark</cell>
<cell topBorder="y" bottomBorder="y" justification="left">1001</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Male</cell>
<cell topBorder="y" bottomBorder="y" justification="left">25</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Analyst</cell>
<cell topBorder="y" bottomBorder="y" justification="left">2005-02-01</cell>
</row>


<row>
<cell topBorder="y" bottomBorder="y" justification="left">ricky</cell>
<cell topBorder="y" bottomBorder="y" justification="left">1005</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Male</cell>
<cell topBorder="y" bottomBorder="y" justification="left">28</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Analyst</cell>
<cell topBorder="y" bottomBorder="y" justification="left">2008-12-01</cell>
</row>


<row>
<cell topBorder="y" bottomBorder="y" justification="left">David</cell>
<cell topBorder="y" bottomBorder="y" justification="left">1007</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Male</cell>
<cell topBorder="y" bottomBorder="y" justification="left">35</cell>
<cell topBorder="y" bottomBorder="y" justification="left">SeniorAnalyst</cell>
<cell topBorder="y" bottomBorder="y" justification="left">2005-08-11</cell>
</row>


<row>
<cell topBorder="y" bottomBorder="y" justification="left">hilary</cell>
<cell topBorder="y" bottomBorder="y" justification="left">1008</cell>
<cell topBorder="y" bottomBorder="y" justification="left">female</cell>
<cell topBorder="y" bottomBorder="y" justification="left">28</cell>
<cell topBorder="y" bottomBorder="y" justification="left">maketing</cell>
<cell topBorder="y" bottomBorder="y" justification="left">2001-02-01</cell>
</row>

</tableBody>
</table>
</displayGroup>
</docBody>
</doc>
</Wrapper>
</List>
 
Sagar Suraj
Greenhorn
Posts: 4
Firefox Browser Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I want to parse a xml content some think like below. It is HTML formatted. How can i parse the content?
I want the values like name,employee number ,age etc....
But they are not defined in particular tag.
Kindly help me out in extracting the content from this HTML formatted Xml content.


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ListForEmployee_1_0 SYSTEM "c:/file/hello.dtd">
<List suppressFolio="n" xmlProviderInfo="test Server" Strategy="normal">
<Wrapper>
<doc>
<docBody>
<displayGroup lineSeparator="n" leftIndent="10" fontFamily="Verdana" fontSize="11">
Service: <startStyle fontEmphasis="b"/>Employee File<endStyle/> <startStyle fontEmphasis="b"/>10 records<endStyle/>
<newLine n="1"/>
Company: <startStyle fontEmphasis="b"/>A2B company<endStyle/>
</displayGroup>
<displayGroup lineSeparator="y">
<table>
<cellWidth numSpaces="10"/>
<cellWidth numSpaces="2"/>
<cellWidth numSpaces="15"/>
<cellWidth numSpaces="20"/>
<cellWidth numSpaces="20"/>
<cellWidth numSpaces="13"/>
<tableBody>
<row>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Name<endStyle/></cell>
<cell topBorder="y" bottomBorder="y">Employee number</cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Sex<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Age<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Designation<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Date<endStyle/></cell>
</row>

<row>
<cell topBorder="y" bottomBorder="y" justification="left">Mark</cell>
<cell topBorder="y" bottomBorder="y" justification="left">1001</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Male</cell>
<cell topBorder="y" bottomBorder="y" justification="left">25</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Analyst</cell>
<cell topBorder="y" bottomBorder="y" justification="left">2005-02-01</cell>
</row>


<row>
<cell topBorder="y" bottomBorder="y" justification="left">ricky</cell>
<cell topBorder="y" bottomBorder="y" justification="left">1005</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Male</cell>
<cell topBorder="y" bottomBorder="y" justification="left">28</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Analyst</cell>
<cell topBorder="y" bottomBorder="y" justification="left">2008-12-01</cell>
</row>


<row>
<cell topBorder="y" bottomBorder="y" justification="left">David</cell>
<cell topBorder="y" bottomBorder="y" justification="left">1007</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Male</cell>
<cell topBorder="y" bottomBorder="y" justification="left">35</cell>
<cell topBorder="y" bottomBorder="y" justification="left">SeniorAnalyst</cell>
<cell topBorder="y" bottomBorder="y" justification="left">2005-08-11</cell>
</row>


<row>
<cell topBorder="y" bottomBorder="y" justification="left">hilary</cell>
<cell topBorder="y" bottomBorder="y" justification="left">1008</cell>
<cell topBorder="y" bottomBorder="y" justification="left">female</cell>
<cell topBorder="y" bottomBorder="y" justification="left">28</cell>
<cell topBorder="y" bottomBorder="y" justification="left">maketing</cell>
<cell topBorder="y" bottomBorder="y" justification="left">2001-02-01</cell>
</row>

</tableBody>
</table>
</displayGroup>
</docBody>
</doc>
</Wrapper>
</List>
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
First things first!

Have you been able to parse this document into a DOM using the standard Java library parser?

If you can get a DOM, you will have to locate each of the table "row" Elements then extract the NodeList of "cell" elements inside each row.

These NodeList collections will maintain the order of the "cell" elements so you can extract the values in each column of the table.

Bill


 
Sagar Suraj
Greenhorn
Posts: 4
Firefox Browser Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am able to parse the document using dom parser and I can retrieve the valuse from the below tags.
<row>
<cell topBorder="y" bottomBorder="y" justification="left">ricky</cell>
<cell topBorder="y" bottomBorder="y" justification="left">1005</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Male</cell>
<cell topBorder="y" bottomBorder="y" justification="left">28</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Analyst</cell>
<cell topBorder="y" bottomBorder="y" justification="left">2008-12-01</cell>
</row>

below is the piece of code I have used.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Use the factory to create a builder
DocumentBuilder builder;

try {
builder = factory.newDocumentBuilder();

Document doc;

//doc = builder.parse(response);
doc= builder.parse(new InputSource(new ByteArrayInputStream(xmlResponse.toString().getBytes("utf-8"))));
// here xmlResponse is the xml to be parsed

NodeList nodes = doc.getElementsByTagName("row");

System.err.println("in nodes is " + nodes.getLength());

List ls =new ArrayList();



for (int i = 0; i < nodes.getLength(); i++) {

Element element = (Element) nodes.item(i);
//List ls1 =new ArrayList();
LmlPrinterFriendlyResponseParsed lmlTextOnly = new LmlPrinterFriendlyResponseParsed();
NodeList nTitle = element.getElementsByTagName("cell");
for(int j = 0; j < nTitle.getLength(); j++){
Element line = (Element) nTitle.item(j);

//System.err.println("line is "+line);
String title = getCharacterDataFromElement(line);


}
}
}


But I couldnt retrive the values from the below tags. I want the values Name,Sex,Age<Destination,Date etc....
Sicne it contains ><startStyle ***> tag i cudnt proceed.
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Name<endStyle/></cell>
<cell topBorder="y" bottomBorder="y">Employee number</cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Sex<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Age<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Designation<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Date<endStyle/></cell>

NodeList nodes = doc.getElementsByTagName("row");

System.err.println("in nodes is " + nodes.getLength());

List ls =new ArrayList();



for (int i = 0; i < nodes.getLength(); i++) {

Element element = (Element) nodes.item(i);
//List ls1 =new ArrayList();
LmlPrinterFriendlyResponseParsed lmlTextOnly = new LmlPrinterFriendlyResponseParsed();
NodeList nTitle = element.getElementsByTagName("cell");
for(int j = 0; j < nTitle.getLength(); j++){
Element line = (Element) nTitle.item(j);

NodeList nStyle = line.getElementsByTagName("startStyle");
for(int k = 0; k < nStyle.getLength(); k++){
Element elemStyle = (Element) nStyle.item(k);
String title = getCharacterDataFromElement(line);


}
}
}


 
Sagar Suraj
Greenhorn
Posts: 4
Firefox Browser Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am able to parse the document using dom parser and I can retrieve the valuse from the below tags.
<row>
<cell topBorder="y" bottomBorder="y" justification="left">ricky</cell>
<cell topBorder="y" bottomBorder="y" justification="left">1005</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Male</cell>
<cell topBorder="y" bottomBorder="y" justification="left">28</cell>
<cell topBorder="y" bottomBorder="y" justification="left">Analyst</cell>
<cell topBorder="y" bottomBorder="y" justification="left">2008-12-01</cell>
</row>

below is the piece of code I have used.

But I couldnt retrive the values from the below tags. I want the values Name,Sex,Age<Destination,Date etc....
Sicne it contains ><startStyle ***> tag i cudnt proceed.
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Name<endStyle/></cell>
<cell topBorder="y" bottomBorder="y">Employee number</cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Sex<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Age<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Designation<endStyle/></cell>
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Date<endStyle/></cell>
Below is the piece of code I have used

 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

But I couldnt retrive the values from the below tags. I want the values Name,Sex,Age<Destination,Date etc....
Sicne it contains ><startStyle ***> tag i cudnt proceed.
<cell topBorder="y" bottomBorder="y" justification="left"><startStyle fontEmphasis="b"/>Name<endStyle/></cell>
<cell topBorder="y" bottomBorder="y">Employee number</cell> .....



Well of course you can't, that row is being used as a header, not data. You need to skip that row and find the rows with real data.

Incidentally, your post would be more readable if you used the "Code" annotation.

Bill
 
Ranch Hand
Posts: 734
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can simply do this, if you are not very fluent in traversing nodes.

That depends on the dom level 3 support. In most dom parsers not too archaic, even though they may only have partial level 3 support, should have getTextContent() support in place.

ps: Your doctype line is actually incorrect. I wonder how it comes about!
 
rubbery bacon. rubbery tiny ad:
New web page for Paul's Rocket Mass Heaters movies
https://coderanch.com/t/785239/web-page-Paul-Rocket-Mass
reply
    Bookmark Topic Watch Topic
  • New Topic