• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Parse a XML file by supplied tag name using SAX

 
Ranch Hand
Posts: 116
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am trying to write a generic application which will read a very large XML file and grab all the specified elements (node name will be passed as an argument) and it's attribute names and value. The application then would dynamically construct a SQL insert statement and by using JDBC API would do a table load. I have written such an application using DOM which works OK with small XML files. But when comes to a file of over 500 meg or so in size the document parse would take forever. A similar application which is not so generic (as I have the tag names hardcoded) take little over a minute to run. Heard about StAX. So far could'nt find any useful information about StAX.
Well, would really really appreciate if anyone could show me some code snippets, lead, hints etc. explicitly using SAX (looks like do not have any choice?) where I can pass the tag name as an argument value to the executable. In other words, looking for a generic solution.

Thanking you all
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Realize that every element start tag will cause a call to your implementation of the startElement() method of your extension of org.xml.sax.helpers.DefaultHandler. (I'm assuming a current Java standard library)
At that point you can look up the name of the element in your list of names and decide what to do with it.
This might involve setting a flag that says "grab all the content from now until the endElement() method for this tag is called" and building your insert statement.
Bill
 
Ranch Hand
Posts: 45
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I found this example on wrox website

XML file
=========
<?xml version="1.0"?>
<!DOCTYPE train [
<!ELEMENT train (car*)>
<!ELEMENT car (color, weight, length, occupants)>
<!ATTLIST car type CDATA #IMPLIED>
<!ELEMENT color (#PCDATA)>
<!ELEMENT weight (#PCDATA)>
<!ELEMENT length (#PCDATA)>
<!ELEMENT occupants (#PCDATA)>
]>
<train>
<car type="Engine">
<color>Black</color>
<weight>512 tons</weight>
<length>60 feet</length>
<occupants>3</occupants>
</car>
<car type="Baggage">
<color>Green</color>
<weight>80 tons</weight>
<length>40 feet</length>
<occupants>0</occupants>
</car>
<car type="Dining">
<color>Green and Yellow</color>
<weight>50 tons</weight>
<length>50 feet</length>
<occupants>18</occupants>
</car>
<car type="Passenger">
<color>Green and Yellow</color>
<weight>40 tons</weight>
<length>60 feet</length>
<occupants>23</occupants>
</car>
<car type="Pullman">
<color>Green and Yellow</color>
<weight>50 tons</weight>
<length>60 feet</length>
<occupants>23</occupants>
</car>
<car type="Caboose">
<color>Red</color>
<weight>90 tons</weight>
<length>30 feet</length>
<occupants>4</occupants>
</car>
</train>


TrainReader.java
=================
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class TrainReader extends DefaultHandler
{

private boolean isColor;
private String trainCarType = "";
private StringBuffer trainCarColor = new StringBuffer();
private Locator trainLocator = null;

public static void main (String[] args)
throws Exception
{
System.out.println("Running train reader...");
TrainReader readerObj = new TrainReader();
readerObj.read(args[0]);
}

public void read(String fileName)
throws Exception
{
XMLReader reader =
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
reader.setContentHandler (this);
reader.setErrorHandler (this);

try
{
reader.setFeature("http://xml.org/sax/features/validation", true);
}
catch (SAXException e)
{
System.err.println("Cannot activate validation");
}

try
{
reader.parse(fileName);
}
catch (SAXException e)
{
System.out.println("Parsing stopped : " + e.getMessage());
}
}

public void setDocumentLocator(Locator loc)
{
trainLocator = loc;
}

public void startDocument()
throws SAXException
{
System.out.println("Start of the train");
}

public void endDocument()
throws SAXException
{
System.out.println("End of the train");
}

public void startElement(String uri, String localName, String qName, Attributes atts)
throws SAXException
{
if (localName.equals("car")) {
if (atts != null) {
trainCarType = atts.getValue("type");
}
}

if (localName.equals("color"))
{
trainCarColor.setLength(0);
isColor = true;
} else
isColor = false;
}

public void characters(char[] ch, int start, int len)
throws SAXException
{
if (isColor)
{
trainCarColor.append(ch, start, len);
}
}

public void endElement(String uri, String localName, String qName)
throws SAXException
{
if (isColor)
{
System.out.println("The color of the " + trainCarType + " car is " +
trainCarColor.toString());
if ((trainCarType.equals("Caboose")) &&
(!trainCarColor.toString().equals("Red")))
{
if (trainLocator != null)
throw new SAXException("The caboose is not red at line " +
trainLocator.getLineNumber() + ", column " +
trainLocator.getColumnNumber() );
else
throw new SAXException("The caboose is not red!");
}
}
isColor = false;
}

public void warning (SAXParseException exception)
throws SAXException {
System.err.println("[Warning] " +
exception.getMessage() + " at line " +
exception.getLineNumber() + ", column " +
exception.getColumnNumber() );
}

public void error (SAXParseException exception)
throws SAXException {
System.err.println("[Error] " +
exception.getMessage() + " at line " +
exception.getLineNumber() + ", column " +
exception.getColumnNumber() );
}

public void fatalError (SAXParseException exception)
throws SAXException {
System.err.println("[Fatal Error] " +
exception.getMessage() + " at line " +
exception.getLineNumber() + ", column " +
exception.getColumnNumber() );
throw exception;
}

}
[ January 26, 2006: Message edited by: Sara James ]
 
Trust God, but always tether your camel... to this tiny ad.
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic